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SYSTEM FOR FINDING DIFFERENCES BETWEEN TWO COMPUTER FILES AND UPDATING THE COMPUTER 
FILES 

5 ' - 

BACKGROUND OF THE INVENTION 

This invention is directed toward apparatus and methods for 
updating existing computer files. More particularly, the invention is 

10 directed toward methods and apparatus for identifying only the 
portion of the file to be changed, forming a patch program to 
implement the change at the bit level, and forming a Patch Apply 
program to implement the desired changes in the existing file 
thereby forming a revised file, meaning a NEW or updated file. 

15 ."-- ■ ■■ : -' - . - : ' " - - 

BACKGROUND OF THE ART 

- The changing or "updating" of files '* is a normal process in 
computer science. Application files are routinely updated, as 
technology advances and improvements are developed. Application 
2 0 files are also frequently updated to eliminate bugs that are found 
during usage. ^ Data files are typically modified as new data are 
acquired, or old data are determined to be invalid. Files comprising 
text are also routinely modified for numerous reasons well known in 
the art. 1 r: '■ " 

2 5 Traditionally, changes in files have been' implemented by - 

creating a completely NEW file containing all desired changes, and 
distributing this full, modified file to all users to be downloaded to 
replace the existing file. Distribution of a completely new version of 
- the file is costly to the provider. Reinstallation of the new version is 

3 0 costly to the end users. Often, the initial version of the file is lost, 

: and cannot" be retrieved for economic or technical reasons. 
Reinstallation is bandwidth or media intensive for the end user. 

File "patching" techniques have been used in the prior art to 
revise existing files. The administration of these techniques is 
3 5 complex, and installation of the patch is often as involved technically 
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and- economically as the installation of a fully revised version of the 
file. 

In view of the previously described methodology and prior art," 
• an object of this invention is to provide a system to modify or patch 
5 an existing file at the bit level and only in those areas of the file 
requiring modification. 

Another object of the invention is to provide a system for 
easily building a patch, and easily installing the patch. 

Yet another object of the invention is to provide a system for 
10 finding the difference between an existing computer file and an 
updated version of the file using a fixed amount of memory, and 
using these differences to implement the desired update of the 
existing file. 

Another object of the invention is to provide a patch system 

1 5 which works with a single file, with a group of files, with directories, 

and with directory trees. 

Still another object of the present invention is to provide a 
patch system for an existing file which can be used to fix bugs, 

2 0 while providing an error checking system and a backup system 

which allows the user to restore the original existing file. 

Yet another object of the invention is to provide a system with 
which multiple changes or patches can be made in an existing file 
with a single application of the system. 

2 5 There are other objects and advantages of the present 

invention which will become apparent in the following disclosure. 

SUMMARY OF THE INVENTION 

The first step in the computer file update or patch process 

3 0 involves the building of a Patch File. The existing or original file, 

which will be referred to as the OLD file, and the revised file, which 
will be referred as the NEW file, are input into a Patch Build program. 
The differences in the OLD file and the NEW file are determined by 
the Patch Build program, and this information is output by the Patch 
3 5 Build program as a Patch File. Operation of the Patch Build program 
will be disclosed in detail in a subsequent section. The changes 
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3 

required to convert the OLD input file to the NEW input file are 
transferred as the Patch File. 

The next step in the computer file patch involves the 
distribution of the Patch File, along with a Patch Apply program, to 
5 the end users so that the OLD file (wherever located) is^ efficiently 
converted ta the desired, updated NEW - file. The OLD file and the 
Patch, File are input .by the end user into, the Patch Apply program. 
The, Patch . Apply program changes, ,at the bit -level, only the portions 
of the OLD file required to yield the_ desired file update. The updated 
10 file is . therefore output from, the Patch Apply program yielding the 
desired NEW .file at the user level. < i: ~ * 

. ^ JBy, distributing only .the Patch File and Patch Apply program to 
the end .-.users, ,the desired , file update can be implemented by the end 
user . with, . maximum .. operational and- economic efficiency. 
15.. Furthermore,. the update ; is , implemented - with; numerous safety 
. features; -including > ,(1) automatic verification .that* the correct; files 
... have-. ( been used and- that the patches have been* built and ^ applied 
...correctly,. /(2). .automatic check for sufficient disk space, (3) restart 
. capability;. after-, power failure, (4) backup of any files ^affected Jby a 
2 0 patch, and (5) the ability 4o reverse , the patched file or an entire f 
system to the prior condition. . 

BRIEF DESCMP OF THE DRAWINGS 

. , ,So -.that the , manner in which* the ..above recited features, 

2 5 ; advantages and objectives of the present invention are. attained and 

can. be understood in detail, more particular description of. the 
. invention, briefly, summarized above, may be had by reference to the 
v appended ( drawings. It is noted,, however, that the appended 

drawings illustrate only .typical embodiments of the invention and 

3 0 therefor not to be considered limiting in scope, for the invention may 

admit, to other equally effective embodiments... / ■ 

.-Figs., .la and lb, .disclose a conceptual overview of the invention; 
Figs. 2 is a flow chart of the patch build program used -to create 
the Patch .File; . v ... iy • / - ; 

3 5 .. Fig. 3 is ^a patch apply flow chart forming a NEW file; 

Fig. 4 illustrates a .string table in the preferred embodiment; 
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Fig. 5 is a flow chart of the preparation of the string Table; 
Fig. 6 is a match table optimization flow chart; and 
Fig. 7 is a patch file encoding flow chart. 

5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Figs, la and lb present a conceptual overview of the invention. 
For purposes of discussion, it will be assumed that software updates 
or revisions are distributed (by mailing a disc, by downloading or 
otherwise) to multiple end users from a central organization location. 
10 As an example, a software company may formulate an updated file 
version, and create the associated Patch File (along with the Patch 
Apply program), at its headquarters facility. This will be referred to 
as the "central" facility. Copies of the Patch File and Patch Apply 
program then are distributed to the company's customer user base so 

1 5 that the customers can update their existing files at their locations. 

These will be referred to as "end user locations". The NEW file used 
to create the Patch File at the central facility will sometimes be 
referred to as the NEW "template" file in order to distinguish this 
-^^c.ap^=oi^he^NE3&^e^ 

2 0 at remote locations by conversion of remotely distributed OLD files. 

Fig. la conceptually illustrates activities at the central facility. 
The original file, which is referred to as the OLD input file 20, is input 
into a Patch Build program 23. The revised file, which is referred to 
as the NEW input file 22, is also input into PATCH Build program 23. 

2 5 The differences in the OLD input file 20 and the NEW input file 22 

are determined by the Patch Build program 23 as will be detailed in 
subsequent sections of this disclosure. The output of the Patch Build 
program is a Patch File 36. The Patch File 36 contains only the 
changes required to convert the OLD input file 20 to the desired NEW 

3 0 input file 22, and is readily distributed to end user locations. 

Fig. lb conceptually illustrates activities at one of typically a 
plurality of end user locations. The Patch File 36, along with the end 
users version of the OLD input file 34, is input into a Patch Apply 
program 31. While the OLD files are identical, the numeral 20 is used 
3 5 to designate this file copy at the central location and the numeral 34 
is used to designate the duplicate copies at the end user locations. 
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The Patch Apply program 31 Is typically created at the central 
facility and distributed to the end user locations. As illustrated in 
Fig. lb, it converts the end users OLD file 34, with the help of the 
Patch File 36 into the desired, updated NEW file 40 at the end user's 
5 facility. The Patch Apply program 31 changes, at the bit level, only 
the portions of the OLD file 34 required . to yield the desired file 
update as will be described in detail in a subsequent section of this 
disclosure. The resulting NEW file 40 is, therefore, output from the 
Patch Apply program 31 and is the same : as the NEW file 22 at the 

1 0 central location, but it is identified with the numeral 40 to emphasize 

that the conversion is made at. the end user's facility. 

Fig. 2 illustrates the patch construction process in more detail. 
Elements of the Patch Build program 23 are enclosed within* the 
broken line boundary as indicated. The OLD input file 20 is input- to 
15 a string table coding circuit 12 forming a string .table output ^o. a 
matching circuit 14. Both circuits are explained . in detail below. The 
match table circuit has an input of the entire NEW input file <22;i. in 
conjunction with the. stored string table 24, the- match table circuit 
produces the match table 26, which- is. retained in a suitably sized 

2 0 match table storage means, a conventional, memory , module. The 

operations of. string table circuit coding and .match table coding 
alternate until all of the OLD input file 20 has been processed by : the 
string table, circuit 12, and the stored string , table 24 has been 
processed into the match table circuit ,14. . The stored ^ match table 

2 5 data is output to the match table optimization circuit 16 to alter it. 

, When this process is completed, the OLD input file 20, the NEW File 
22 and the match table memory 26.. are input to a patch file encoding 
circuit 18 which serially passes the completed patch file 36 to a patch 
file output means 28. " 

3 0 Fig. 3 illustrates the NEW file generation process at typical 

multiple end user locations. Elements of the patch apply program 31 
are enclosed within the broken line boundary as indicated. The 
patch file 36 is taken from the input means to a patch file decoder 
30, which produces an auxiliary table 38, retained in the auxiliary 
3 5 table memory. Concurrent with this process, a NEW file construction 
converter circuit 32 reads serially the portions of the auxiliary table 
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38 from memory while it is being constructed, as well as the input 
patch file 36 and the OLD input file 34, and then serially sends the 
NEW file 40 to an output means 41. 

Referring now to Fig. 4 and Fig. 2, the string table 24 is a 
5 modified 8-way B-tree containing a plurality of nodes of five 
different types. As shown in Fig. 4, a root node 42, only one is 
needed, contains up to seven keys (labeled Kl through K7) and up to 
eight child pointers (labeled CI through C8) which identify an 
interior node 44 or a leaf node 46, and an empty "parent" node 43, 

1 0 all in the root node 42. Each interior node 44 (there may be none or 
a plurality), contains the same information as the root node 42, but 
with the addition of a parent pointer. The unique root node 42 and 
interior node 44 contain a child pointer identifying the interior node 
44 in question. Each leaf node 46 (there may be none or several) is 

1 5 identified by its location at a fixed distance (measured by the 
number of child pointers that must be traversed to it) from the root 
node and contains up to seven keys (again labeled Kl through K7) 
and corresponding list pointers (labeled 11 through 17), which each 

<--e^^48»=^g»4ist.^te i'minator 50; al so - the . game pa r e nt— 

2 0 information as the interior node and an additional field denoted LP (a 
list entry 48 or list terminator 50) corresponds to one of the keys in 
one of the parents of the leaf node 46. The method for associating 
each key in an internal node 44 or root node 42 with an LP field in a 
leaf node 46 is described in the subsequent section entitled 

2 5 "operation". Each list entry 48, of which there may be none or a 
plurality, contains an instance field 49' (labeled i), which identifies a 
location of the corresponding key in the OLD file, and a pointer field 
49 (labeled P), which identifies a list entry 48 or list terminator 50 
corresponding to the next instance of the same key. Each list 

3 0 terminator 50, of which there may be none or a plurality, contains 
the same instance field as a list entry, but is identified by an empty 
pointer 50' (labeled x) indicating the end of the list of instances of 
the corresponding key. 

The match table 26 (shown in Fig. 2) is a rectangular array with 
3 5 four columns and rows equal to the number of "chunks" in the NEW 
input file 22, where a "chunk" is a fixed-size portion of the file 
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(required to be a power of 2) and is an operational parameter of the 
overall patch file construction process illustrated in Fig. 2. Table 1 
illustrates a match table. 

5 " . TABLE 1 ... ■ ,. 



. Match table 



I;l .- Li 






I 2 L 2 


_ RL2 


-s 2 


I 3 . L3 


RL 3 


S 3 . 




• 


s 4 , 

• 


* • 

* • " * 


r 

RL X 


# 

S x ' 



. The - four, columns labeled I, L, RL and S (and all encode parameters) 
.„ relate to a possible : match for this "chunk" in the OLD Jnput ;file^20. 

The <L^ column . identifies- the number of consecutive: characters. in.the 
, OLD -Input file 20 starting- at the position identified by the -I column 
20 t that match : >(at , least, approximately) the .characters; startingr-at^ the : ; 
, - beginning of the,, "chunk'^n the : NEW Input; file -22. - : ^ The .■ RL rcplumn 
..- identifies -*he number^ of consecutive characters in the OLD ..Inputy file 
■> ^O^bef pre ..Jhe.-. ^pp.sitip^ identified, in the, i column, that match , (at least 
:■, , approximately Jhe .characters before this;, "chunk" in the - NEW Input 

2 5 file 22. If no match has been located for this, '"chunk" Lthen the, L .and 

RL columns both contain 0. If this "chunk" has been identified as a 
.,. candidate -for, special . handling ^escribed : in : -the subsequent 
"Operation" ,rthen the RL column will contain one of several, possible 
special handling markers .(also described :in a subsequent section). 

3 0- : The S column .contains;. ;differenti values, at .different , times, c During, the 

. .match j table preparation shown in Fig.. 2, .'the S column ^contains an 
estimate.- .-of-,, the^ number of characters required to encode this 
approximate -match, -together, with the mismatches that occur , within 
it. During the match -table Optimization process; 16 shown ; - in .Fig. 2, 
3 5 the S column is gradually converted :to an -estimate of the total 
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number of characters required to encode the patch file from this 
"chunk" to the end. 

Fig. 8 of the drawings shows in broad sweep the operation of 
5 the match table preparation. Referring to table 1 given above, Fig. 8 
includes a first set of prospective or tentative matches. At the top of 
Fig. 8, assume that approximate matches have been indicated in the 
series of matches which are L,...L X . These matches may overlap, and 
Fig. 8 intentionally shows such an overlap. This occurs at a 
1 0 preliminary state of affairs. This occurs before optimization. 

As shown in Fig. 8, the preliminary stage may well include 
several tentative matches in adjacent blocks. Without regard to the 
blocks, without regard to the length as measured looking to the left 

1 5 or looking to the right, without regard to the values of L| or the value 

of RL,, there may be an overlap at an early stage of approximation. 
At the end however the approximation is terminated so that the 
adjacent values of L k and also L k +jare adjusted so that overlap is 
==£emo,v^d*==^^ can 

2 0 be obtained, Fig. 8 shows a representative file and also shows the 

overlap of adjacent blocks. At the conclusion of the optimization 
process, the overlap is eliminated. In the upper part of fig. 8, the 
brackets are shown with an overlap while the optimization process 
removes the overlap so that the entries in the match table L k and 

2 5 L k +i no longer overlap. 

Based on the foregoing it would then be observed that the 
operation of the optimization process ultimately continues until the 
position identified in the I column builds into the entire file has been 

3 0 handled and matches are now evaluated and are optimized. Since 

precise definition then prevails at the end of the process, the content 
of Table 1 is simplified. Effectively, of Table 1 is reduced to a series 
of entries which still includes the I column. The aggregate length of a 
given entry will then be different, and the S column will likewise be 
3 5 modified, all is explained below. 
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Fig. 5 is a flowchart that describes the match table preparation 
14. This process (as shown in Fig. 2) uses a completed string table 24 
and a NEW -input, file 22 as inputs to form a prepared (or updated) 
match table 26 output. The method proceeds by passing the NEW 
5 : input file 22 t backward, a. "chunk" at a time. Recall that a "chunk 1 ' is a 
fixed-size portion^ of the file (size is a power of 2) and is an 
operational parameter of the overall patch file construction process. 
The match table is initially set to "0" at step 60 and a set pointer is 
set to the last "chunk" of NEW input data at step 62. , As each "chunk" 
10 is processed, it is ..checked at step 64 to see if the string of input 
"chunk" has been exhausted. If exhausted, the. process is stopped at 
step 65.. The next, step in .the method is to check- this ^chunk'V for 
special handling at step 66. If special 'handling is warranted, .this, is 
marked in the match table rat step 68 and the entire contiguous set. of 

1 5 "chunks" .to , which the special handling applies is marked and skipped 

at tstep 72. * This .step will be„ expanded In : the ; subsequent section 
, entitled, "Oper ati on". , If special handling is not warranted, the~ ^string 
. table -24 is consulted to locate all the instances in the current portion 

of. the, OLD Input : ..file _ 20 : that exactly matches the current "chunk" of 

2 0 .the NEW Input file ,22. Each instance is examined at step < 70. and 

. .extended r forward . and backward to locate the largest current 
. (possibly _ approximate) match, where "largest" means the * longest 
.match ; in the forward direction with the length in the reverse 
direction, -used- , to v break ties. . The largest current match is . then 

2 5 compared to the match for this "chunk" already marked in the match 

. table (from processing previous portions of the OLD Input file - 20)., If 
.. , the largest current match is longer in the forward direction than the 
previously marked match or has the same forward length (plus a 
. longer .reverse -length), the largest current match is marked in the 

3 0 match table and all "chunks" -contained , within, the extent of the 

reverse portion of this match are marked .and skipped at step 72. If 
there . is .no current match found, or if the largest current match is 
smaller than the previously marked match, then the match table is 
not altered for this "chunk." In this case, if approximate matching is 
3 5 being used (or. if there is no previously marked match for this 
"chunk"), the NEW Input file 22 is moved to the next "chunk", in the 
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reverse direction, and again checked at step 64 to see if the input list 
is exhausted. If exact matching is being used, then all "chunks" 
contained within the extent of the reverse portion of the previously 
marked match are skipped. This entire process is repeated until the 
5 NEW input file 22 is exhausted as determined at stop 64. That is, the 
input file is exhausted when the first "chunk" of the file has been 
processed. 

The match table optimization process 16 is shown as a whole in 
Fig. 2. Details of the steps in match table optimization are shown in 

1 0 Fig. 6. This process operates entirely on the match table 26 without 

reference to any other data structures. The method begins by setting 
the current size to zero at step 80, and then proceeds by passing the 
match table 26 "backward" beginning with the row corresponding to 
the last "chunk" in the NEW input file 22. If a row corresponds to a 
15 special handling "chunk" as determined at step 90, then it is skipped. 
The value of L is checked at step 86. If a row has an L value of zero, 
thereby indicating that no match was ever located, then the "chunk" 
size is added to the current size and S of this row is set to that value 
— at-step^-SS, — l«=4h4s==6a^ T ^e=Re^^^ 

2 0 direction, checking at step 82 to see if the entire table has been 

processed. However, if the value of L for this row is nonzero, then 
the rows corresponding to "chunks" within the forward extent of the 
match described in this row are examined at step 94, and the one 
with the smallest value of S is located. The current size variable is 

2 5 set to the sum of the smallest S value located and the S value of the 

current row. S of the current row is set to the current size at step 94. 
L of the current row is adjusted downward at step 94 to stop the 
current match at the beginning of the "chunk" corresponding to the 
smallest S value located. All rows within the reverse extent of the 

3 0 current match are then skipped, after setting at step 96 all of their S 

values to the current size, and adjusting at step 98 their L values to 
reflect the new end of the current match. This entire process is 
repeated until the match table 26 is exhausted, as determined at step 
82, and then stopped at step 84. That is, the entire table has been 
3 5 processed when the first row of the table has been processed. 
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Fig. 7 is a flowchart which describes, in detail, the patch file 
encoding method 18 illustrated in Fig. 2. This process takes the 
match table 26, OLD input file 20 and NEW input file 22 as inputs and * 
• produces the output patch file 36 as output. The method proceeds by 
5 first setting a pointer at -step 100 and then passing the match table 
26 in the forward direction by beginning at the row corresponding to 
the first "chunk" in the NEW input file 22, and examining each xow in 
turn until that table is exhausted as indicated at step 102. As the 
various patch file records are processed, they are placed in a 

10 temporary storage means to be passed on at step 114 to the post- 
processing encoder at the end of the process, which is terminated at 
step i 16. If the row is marked for special handling as determined at 
step 104, then a corresponding "special handling" record is generated. 
This may require examination of this region of the NEV/ input file .22. 

15 The immediately subsequent t rows that correspond to "chunks" : 
within , the extent of the region covered by the special handling are 
skipped at step 110. If the row is. not. marked for special handliiigi 
but has an L value of zero as determined at step 106, then an "Add" 
record^ is generated at step 112, which requires examination of., the 

2 0 corresponding region of the NEW input file 22. All immediately 
subsequent rows, that also have zero L values and that, are;- not 
marked for special handling, are skipped. If, however, the current 
row is not marked for special handling and has a nonzero L value, 
, then both , the OLD input file 20 and the NEW input file .22 u are 

2 5 examined to locate any mismatches in this (possibly approximate) 

matched region. In this case, "Copy", xecords and possible "mod" 
records are generated at step 1,08, and all immediately subsequent 
rows that correspond to "chunks" within the extent >of this match are 
skipped. This entire process is . repeated by returning to the step, 102 

3 0. until the match table 26 is exhausted by the processing of the last 

row of the table, as indicated by the check at the step 102. The patch 
file records in the temporary storage means, are then passed to the 
post processing encoder at step 114 for final optimization and 
encoding. A preferred embodiment of this encoder is discussed in 
3 5 the "operation" section below. The post processing encoder then 
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passes the completed output patch file 36 to the output means 28 
shown in Fig. 2. 

OPERATION 

5 The following discussion describes the preferred operation of 

the invention. It should be understood, however, that there are 
alternate modes of operation which will yield desired results. There 
are a number of design parameters for the operation method that 
will impact the discussion of the method in various ways. These 
10 parameters will be defined, and the values for these parameters that 
are used in the preferred operational embodiment will be given. 

1. "Chynk" size : This parameter controls the minimum match size 
as well as the granularity of the match table 26 and must be a power 
of 2. Increasing this value allows one to increase efficiency at the 

15 expense of effectiveness. The preferred embodiment is 8, but values 
of 4, 16, etc. are certainly reasonable. 

2. Speed — factor : This parameter controls the granularity of the 
string table 24. Increasing this value allows one to increase 

2 0 embodiment uses user-selectable relative values between 0 and 10 
inclusive. 

3 - Approximate-matching tolerance : These parameters control 

how closely two strings must coincide to be considered a "match." 
Altering these tolerances allows one to tailor the method for specific 

2 5 patterns of changes with different kinds of data. Two tolerances 

used in the preferred embodiment comprise a local tolerance and a 
global tolerance, where both are expressed as fractions. A local 
tolerance of 8/6 specifies that out of each 16 adjacent characters, 8 
would be allowed to mismatch before the match was terminated. A 

3 0 global tolerance of 1/3 specifies that the total number of mismatches 

can be at most 1/3 of the total characters in the match. The 
preferred embodiment uses user-selectable pair values of local and 
global tolerances, respectively, of (8/16, 1/3) which are generally 
used with executable files, and (0/16, 0/1), which is generally used 
3 5 with text files. Note that the latter tolerance pair requires exact (as 
opposed to approximate) matching. 
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4. Chain length limits : This parameter controls the maximum 
number of identical key values that will be chained together in the 
string table 24. It is necessary to limit this for efficiency reasons. If 
no limit is placed on the length of a chain, then a file consisting 

5 entirely of repeating patterns degrades the efficiency of the match 
table preparation process . (see Fig. 2) without helping , its 
effectiveness. . If the value for this limit is low, effectiveness suffers, 
since some potentially useful matches will not be found. : The 
preferred embodiment uses, a fixed value of 20 for this parameter, 
10 but values between ,10 and. 100 are certainly reasonable. Altering 
this parameter based on available memory size is also a potentially 
helpful ramification of- the preferred, embodiment. 

5. Minimum special handling size : This parameter controls special 
handling block size to be . eligible for special handling described 

15 below. The preferred embodiment uses a fixed size of. 12 ^characters 
for this parameter. • 

6. Coding estimation parameters :> These parameters controkvthe 
estimates, used in the match table preparation and match table 

^optimization, of the,, number of- characters that will be takejr to 
2 0 encode a given match. '„ The preferred embodiment, uses t a fixed 
estimate of 6 .characters plus 3 characters per mismatch in, the 
forward direction. t .-".v .: 

Having described a representative set of operating parameters 
of the method shown in .Fig, 2, the preferred^ embodiment of . the 

2 5,. patch file construction method and apparatus will be described in 

detail. The OLD input file 20 and NEW input file 22 are input by 
means of memory mapped .files.,, which facilitates the particular 
access patterns used by the .method. Namely, the method needs to 
access rapidly and repeatedly the current block of the OLD input file, 

3 0 while the NEW Input file is passed ^ sequentially backward or forward. 

The match table 26 is also passed, sequentially backward or forward 
so that it is also stored in a memory mapped file in mass storage. 
The string table 24 is stored in random access memory (RAM) and 
the RAM size determines how much of the OLD Input file may be 
3 5 processed at a time. The temporary patch file storage used for 
buffering the patch file before it is passed to the post processing 
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encoder 114, as shown in Fig. 7, is a dedicated RAM with overflow to 
mass storage, since there is not any a priori limit on the size of this 
temporary patch file storage. The output patch file 28 is an ordinary ' 
• sequential access file since this is adequate for the method. 

The patch file construction method begins by acquiring the 
largest available memory block for use in storing the string table 25 
The largest size of OLD Input file that can be processed into a string 
table of this size is then calculated. This becomes the size of the 
block of OLD input file that will be processed on each pass through 
the string table preparation and match table preparation shown in 
Fig. 2. 

The string table preparation 12 depends greatly upon the 
details of the particular embodiment of the string table 24 that is 
selected. In the preferred embodiment, the string table 24 consists 
of a modified 8-way B-tree with linked list leaves as described 
above. The string table preparation process 12 is executed by then 
consists of the following steps: 

le^LD^tefku^fel^^^^ _ 
portion to process, and initialize the B-tree to "empty." 

Ste P 2 ' If the ke y (string of the same length as a ,, chunk ,, ) 
present at the input pointer is eligible for special handling, skip it 
and go to step 6. 

Ste P 3 - Lookup the key in the B-tree. 

£tep_4. If the key is not in the B-tree, insert it and begin an 
associated instance list. 

SteE_5. If the key is present in the B-tree and the associated 
instance list has not reached the maximum chain length, add this 
instance to the list. 

£tgri_£. Add twice the speed factor plus one to the input pointer. 
If there is still a key remaining in the portion to process, go back to 
step 2. 

"Special Handling" in the preferred embodiment means that the 
key comprises either 8 repetitions of a single character, four 
repetitions of a two-character pattern or 2 repetitions of a four- 
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character pattern. If this pattern continues for a total of at least 12 
. characters,, then the block is eligible for special handling. Potentially 
helpful ramifications often include the addition of other types of 
special handling blocks. . , .. 

5 Skipping by odd amounts in step 6 (when the '•chunk" size is a 

power of . 2), insures that a sufficiently long match will ".always be 
- fcnmd, even , though not every "chunk" in the OLD input file 20 is 
tabulated in the string table 24. As .an example, if the speed factor is 
. equal to ,10, then only the keys whose positions are a multiple of 21 
10 will be placed in the, string table. If, however, an exact match 
between OLD and NEW files has a length of at least 91 characters, 
then this match will be located because at least one of /the NEW 
"chunks" included in the match will exactly match a key from the 
, OLD file placed An the string table. . ^ : ; : - 

15, The lookup and insertion procedures t mentioned, above- .are 

: standard, operations on a B -Tree as described, for ^example, in Knuth, 
.,, The r Art of .Computer Programming, Vol. 3,; and are hereby 
, incorporated into this . disclosure by reference. - The method- for 
associating a key value with an instance list pointer in the preferred 
2 0 embodiment of the string table 24. will, however, be described. 
. Referring to Fig. 4, keys are located in three v different types of nodes 
in the string . table , which ; are the root node 42; interior .nodes 44 : and 
leaf nodes, 46. If a key is in a leaf node 46,, it is clear where the 
corresponding instance list pointer is . located in the : coxpsponding L 

2 5 / field of, the .^same node, v . However, the keys in the root .and interior 

nodes also have associated instance lists that are pointed to by the LP 
fields of various leaf nodes. The important , fact to realize here is 
■ that there are always more leaf nodes than there are keys in all root 
< and interior nodes combined. Specifically, one can begin at any key 

3 0 in a root pr interior node and arrive at. a unique leaf node by the 
. : following process: _(1) proceed down the tree by moving "to the right" 

at the first stage by using the C field whose index is one greater than 
the t index of the key, ( and (2) then taking CI pointers until a leaf is 
. reached. -The LP . field of this leaf node points to the instance list 
3 5 associated with the original key. 
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The match table preparation process was described above in 
the discussion of Fig. 5. To that description, details will be directed 
toward how to mark "chunks" in the match table 26 for special 
handling and how to mark a match in the match table. As mentioned 
5 above, special handling is indicated in the match table by certain 
reserved values in the RL field. The preferred embodiment reserves 
values above FFFFFF00 in hexadecimal to indicate various types of 
special handling: FFFFFF01, FFFFFF02, FFFFFF03 are used to indicate 
repeated characters, repeated two-character patterns and repeated 
1 0 four-character patterns respectively. The method allows other types 
of special handling. 

To mark a match in the match table, the I field is set to the 
position in the OLD input file 20 that matches this "chunk," the L field 
is set to the maximum extent of the match in the forward direction 

1 5 (including the size of the base "chunk"), the RL field is set to the 

maximum extent of the match in the backward direction (not 
including the size of the base "chunk") and the S field is set to reflect 
the coding estimate derived from the number of mismatches in the 
— ^^fcjawa^d** ex-tension -46 plus 3 times -the=Burafeer^0^m4*»atebe«- -in the 

2 0 preferred embodiment). Marking the match is completed by setting 

I of the previous row to I minus the "chunk" size, L of the previous 
row to L plus the "chunk" size, RL of the previous row to RL minus 
the "chunk" size, and S of the previous row to S. This process is 
repeated as long as RL is larger than the "chunk" size. The match 

2 5 table optimization 16 process was thoroughly described above in the 

discussion of Fig. 6. 

The patch file encoding process 18 was described above in the 
description of Fig. 7. Details are now added concerning the encoding 
of the temporary patch file storage which is passed to the post 

3 0 processing encoder, as well as details of the preferred embodiment of 

the post processing encoder itself. The temporary patch file 
comprises a rectangular four-column array, with each row 
corresponding to a patch file record and comprising a code, a NEW 
file position, a length and a modifier. A code describes the record 
3 5 type and is one of the following values: 0 for "Copy," 1 for "Add," 2 
for "Mod," 3 for "Special Handling 1," 4 for "Special Handling 2," 5 for 
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"Special Handling 4." The NEW file position indicates the position in 
the NEW file to which this record pertains, and the length field 
indicates its length. The meaning of the modifier varies according to 
the value in the code field: for "Copy" records, it indicates the OLD 
5 file position from which the block is to be copied; for "Add" records it 
is unused; for "Mod" records it indicates the difference between the 
NEW file contents at that position and the OLD file contents that will 
have been copied into that position during patch application; for 
"Special Handling" records it indicates the 1-, 2-, or 4-character 

10 pattern that will be repeated through that region. This encoding 
(with minor variation) is also used for the auxiliary table 38 in the 
patch application process shown in Fig, 3. 

Various optimizations are possible in the post processing 
encoder (See Fig. 7). The preferred embodiment performs the 

15 following steps: 

Step 1 . Sort all "Copy" and "Special Handling" records - in 
increasing order by NEW file position. * 

Step 2 . , Add 10 to the Code field of any of these records that are 
2 0 not immediately preceded by a "Copy" or "Special Handling" record 
(these new codes represent "Copy With Gap," "Special Handling 1 
With Gap," etc.) 

Step 3 . Form an array, which consists of packed forms of these 
records (specific formats are given below). 

2 5 Step 4 . Add to this array another record, which is a single record 

containing all "Mod" records in packed form, sorted in order by NEW 
File Position (specific format given below). 

Step 5 . Add to this array another record, which is a single record 
containing the actual characters from all of "Add" records, sorted in 

3 0 order by NEW File Position (specific format given below). 

Step 6. Add to this array another record marking the end of the 
patch file (specific format given below). 

Step 7 . Pass this entire array to any general purpose data 
compression routine. 

35 
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The preferred formats of all packed records are as follows. A 
"varindex" field is a • variable-length encoding of a 32-bit unsigned 
integer (that is, an integer between 0 and 4,294,967,295 inclusive) in 
which one character is used for magnitudes between 0 and 127 
5 inclusive, two characters are used for magnitudes between 128 and 

16.383 inclusive, three characters are used for magnitudes between 

16.384 and 2,097,151 inclusive, four characters are used for 
magnitudes between 2,097,152 and 268,435,455 inclusive, and five 
characters are used for magnitudes between 268,435,456 and 

1 0 4,294,967,295 inclusive. Specific formats are as follows: 

Format for "End Of File": 
Code (16) 

1 5 Format for "Copy": 

Code (0); OLD Position (varindex); length (varindex) 

Format for "Copy With Gap": 

— - Code -W};-^ 

20 (varindex) 
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Format for "Special Handling 1": 

Code '(3); Pattern (character); length (varindex) 

Format for "Special Handling 1 With Gap": 

Code (13); Gap Size (varindex); Pattern (character);, 
length (varindex) 

. Format for "Special Handling 2": 

Code (4); Pattern (two . characters); length (varindex) 

Format for "Special Handling 2 With Gap": . . . 

Code (14); Gap Size (varindex); Pattern (two characters); length 
(varindex) 

15 Format for. "Special Handling 4".:,. . 

" Code •(5); t .Pattern .(four characters); length (varindex) 

Format for "Special Handling 4 With Gap": . 

Code (15); Gap Size (varindex); Pattern (four characters); length 
2 0 J- (varindex) / :\ \ ." . . . - . 

. Format. for ;" Add": . j- . .. . . , ;. 

■ Code '(:!•); Total -Length (varindex);. Characters 
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Format for "Mod": 

Code (2); Total Number of Mods (varindex); Diffl (character); 
Pos. Incl (varindex); Diff2 (character; Pos. Inc2 (varindex; ... 

Diffx 

5 (character); Pos. Incx (varindex) 

In the last format, the "Diff" fields are the various amounts that must 
be added to a position to fix up a mismatch, and the M Pos. Inc" fields 
are the distances (position increments) between successive 

1 0 mismatches (Pos. Incl is the position of the first mismatch, Pos. Inc2 

is the distance between the first and second mismatches, etc.). 
More details on the ,, varindex M format follow: 

the 32 data bits of the quantity to be encoded will be 
15 denoted X0 XI X2 ... X31 (from least significant to most significant); 

quantities encoded in one character set X7 through X31 to 

zero; 

quantities encoded in two characters set X14 through X31 
-to— zmo,-- and— aUkas t ™Qne_pf.„X7_thmu.gh„ X13^to„.nsmj£m u .j%u amities 

2 0 encoded in three characters set X21 through X31 to zero and at least 

one of X14 through X20 to nonzero; and 

quantities encoded in four characters set X28 through 
X31 to zero and at least one of X21 through X27 to nonzero; 
quantities encoded in five characters set at least one of X28 through 

2 5 X31 to nonzero. 

In the encoding to follow, successive characters in the encoding are 
delimited by a colon and, within a character, the bits are written 
from most significant to least significant. Then the encodings are as 

3 0 follows: 

One character: 0 X6 X5 X4 X3 X2 XI X0 

Two characters: 10 X13 X12 XI 1 X10 X9 X8 : X7 X6 X5 X4 X3 X2 XI 
X0 

3 5 Three characters: 110 X20 X19 X18 X17 X16 : X15 X14 X13 X12 XI 1 
X10 
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X9 X8 : X7 X6 X5 X4 X3 X2 XI X0 
Four characters: 1110 X27 X26 X25 X24 : X23 X22 X21 X20 X19 XI 8 

X17 X16 : X15 X14 X13 X12 Xll XI 0 X9 X8 : 
X7 X6 X5 X4 X3 X2 XI XO 

5 Five, characters: .11110000 : X31 X30 X29 X28 X27 X26 X25 X24 : 
X23 . 

...... ...... X22 X21 X20 X19 X.18 X17 X1.6 : X15 X14 X13 

X12 XI 1 X10 X9 X8 : X7 X6 X5 X4 X3 X2 XI XO 

1 0 Patch File Application . - s , ' ; 

The preferred ; embodiment of the patch file application method 
and apparatus will now be described in more detail. _ According : to 
Figs, la, lb and 3, the OLD input file 34 is input by means of a 
memory mapped file to facilitate the access patterns of the method, 
15 i while; the input patch file 36 is input via a normal sequential access 
file. The auxiliary table 38 is stored in RAM with overflow to mass 
storage, „ since there is no a priori limit on the size of this table. _ ( The 
- Qutput NEW file„.40 is built using a memory mapped file to facilitate 
the random access patterns needed: • r ^ 

2 0 Referring to Fig. 3, the patch file decoder 30 essentially 

r accomplishes /the ^reverse of the. post processing . encoder portion of 
r the patch file encoder 18.. That is, the pattern" file decoder 30 
, decompresses, the input; patch file 36 .and decodes it into the auxiliary 
table :38 V ; which is formatted, in the preferred . embodiment, exactly 

2 5 - like the temporary patch file . storage of the- patch file -encoder 18. 

f More specifically, the - auxiliary, table 38 comprises a rectangular 
four-column array, with each row corresponding, to ;a patch -file 
record and comprising a code, a NEW file position, a length, and a 
modifier? The code describes the record . type , and * is one of the 

3 0 .following values:*- 0 for "Copy"; 1 for "Add";- 2 for "Mod 1 ;; 3 for^Spgcial 

Handling 1"; 4 for "Special Handling 2"; and 5 for "Special Handling ~-4." 
The NEW file position indicates the position in the^NEW file ; to which 
this record pertains and the, length field . indicates its length. The 
meaning of the modifier varies according to ,the_value in the . code 
3 5 field for: v.- - ■; . >. ',: 
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"Copy" records, it indicates the OLD file position from which the 
block is to be copied; 

"Add" records, it is unused; 

"Mod" positions, it indicates the difference between the NEW 
5 file contents at that position and the OLD file contents that will have 
been copied into that position during patch application; and for 

"Special Handling" records, it indicates the 1-, 2-, or 4-character 
pattern that will be repeated through that region. 

1 0 This auxiliary table 38, unlike the earlier table, is partitioned into 
separate sections for Copy/Special Handling, Add and Mod records. 
This decoding proceeds by the following steps; 

Step 1 . Decompression of the patch file. 

1 5 Step 2, Decoding of the various patch file records, using the 

formats described above. 

Ste P 3. Synthesizing the NEW file position for each record (see 
below for more details) and placing each decoded record into the 
appmpriate^par^ _. 

20 

It is to be noted that none of the records in the input patch file 
contains a NEW file position. Rather, the NEW file position for each 
record is implicated from the fact that all the Copy and Special 
handling records were packed into the patch file in order and any 

2 5 regions not covered by these records were indicated in various Gap 

sizes. To accomplish the synthesis of the NEW file positions, the 
following method is used: 

Ste P 1- Initialize the current position and the total Gap Size to 0. 

3 0 Step 2. Read a record. If it is an "Add" or a "Mod" or an "End Of 

File" then stop this process. 

Step_3> If this record contains a Gap, then add an entry in the 
"Add" partition of the table with length equal to the Gap Size, and 
NEW file position equal to the current position. Add the Gap Size to 
3 5 the current position and to the total Gap Size. 
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Step 4 . Add an entry to the "Copy/Special Handling" partition of 
the table with code, length and modifier taken from the record and 
NEW file position equal to the current position. Add the length to the 
current position. 
5 Step 5 . Go to Step 2. . 

When the above process is complete, the "Mod" record^ if present, 
will be processed in a similar way. When the "Mod" record has been 
processed, the "Add" record, is processed (if present) by adding one 

1 0 additional entry to the "Add" partition of the table, which has NEW 
file position equal to the current .position and length to the difference 
between the size of the "Add" record and the total Gap Size computed 
earlier. At this point, the preferred embodiment begins the NEW file 
construction process 32. . r . ; .. 

1 5 The NEW File construction process reads the "Copy/Special 

Handling", partition of the auxiliary table 38 and .performs all the 
indicated actions in order. Then it reads the "Mod" partition ^of;. the 
auxiliary table and performs all the. indicated modifications. Finally, 
it reads the "Add" portion of the auxiliary table (in order) and placed 

2 0 the decompressed characters from the "Add" record (if present — 
these characters were not processed earlier) into the appropriate 
{ place in the Output NEW file 40. : 

At this point, the Output NEW, file 40 is an exact duplicate of 
the Input NEW file 22 as illustrated .. in Figs. .la., and _lb thereby 

2 5 accomplishing a stated object of the invention. ... ■ 
While the foregoing, is directed, to the preferred embodiments 
of the invention, the scope thereof is determined by the claims that 
follow. 

What is claimed is: , >> - < - 
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CLAIMS 



1 0 



1. A method for converting an OLD computer file into an updated 
NEW computer file, the method comprising the steps of: 

(a) identifying differences between said OLD file and a NEW 
computer file by passing both files through a patch build program; 

(b) creating a patch file based on identified differences; and 

(c) combining said OLD file and said patch file by passing 
both files through a patch apply program to obtain said NEW file. 



2. The method of claim 1 wherein said patch file is created from 
said OLD file at a central facility, and patch files are subsequently 
distributed to a plurality of remote facilities to convert a plurality of 
the OLD files into plural NEW files and including the additional steps 

1 5 of : F 

(a) creating an auxiliary table from said patch file with a 
patch file decoder subroutine; and 

(b) combining said auxiliary table, said OLD file and said 
,patch_file_.with = a_NEW file construction subroutine thereby generating 

2 0 a copy of said NEW file. 

3. The method of claim 1 wherein the step of creation of said 
patch file comprises the steps of: 

(a) selecting a portion of said OLD file; 

2 5 (b) processing said portion of OLD file with a string table 

preparation subroutine thereby creating a string table from said 
selected portion of OLD file; 

(c) combining said portion of OLD file, said NEW file and said 
string table by a match, table preparation subroutine thereby 

3 0 creating a match table having identified approximate matches 

between said OLD file and said NEW file; 

(d) repeating steps (a) through (c) until all portions of OLD 
file have been processed; and 

(e) combining said OLD file, said NEW file and said match 
3 5 table in a patch file encoding subroutine to create said patch file. 
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4. The method of claim 3 wherein said match table subroutine 
comprises the additional steps of: 

(a) successively .selecting fixed-length substrings from the 
NEW file; 

5 (b) utilizing the- string table to find any matching copies of 

the selected substrings in said OLD file portion; and 

(c) extending said matching substring copy to a longer, 
approximate match in the forward and reverse directions. : 

10 5. The method of claim 3 wherein a string table is prepared from 
the OLD file by the steps: - - : . : , : r : ./" 

v . (a.) successively selecting fixed-length substrings from said 
portion of OLD file whose position relative, to beginning of said OLD 
file portion is a multiple of a fixed increment, said fixed length and 

15 fixed increment- having no common factors- greater than one; and 

(b) if said selected substring does not comprise a repeating 
pattern, recording location of said selected substrings in said string 
table. . .• : _/ ; : - , / - - ■ ■ - : .-; . /•> 

2 0 6. The, method /of claim 5 wherein -substrings which have already - ^ 
been recorded in said string table a fixed number of times are^not 
recorded further in said string table until a subsequent portion of 
said OLD file, is selected.. ^ // : ; : .... ■>■;, r ; 

2 5 7. A computer ; program for converting /an OLD -computer file into 

an -updated ~NEW. computer file, the program comprising: 

(a) a NEW .file .template program which defines the form of 
said NEW file; .//./ c> . ,,// „ r 

(b) a patch build program which identifies a difference 

30 r . (i) identifying, a portion .of said OLD file to be ^ 

converted,, ~ v ~ \ >\ > ■ -i, and ♦ /• - ■■>;,. ■ - : • 

. . (ii) creating a patch file; and . ; r- . f.c" 

(c) a patch apply program which converts /said OLD file into 

said NEW file by combining; said patch file with said portion -pf the 

3 5 OLD file to be converted, to .form said NEW file. . 
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8. The computer program of claim 7 further comprising: 

(a) a string table memory receiving a string table from said 
OLD file formed by a string table subroutine; 

(b) a match table memory for receiving the result of a match 
5 table subroutine which combines said OLD file, said NEW file and said 

string table to create a match table having identified approximate 
matching regions between said OLD file and said NEW file; and 

(c) a patch file encoding subroutine which combines said OLD 
file, said NEW file and said match table to create said patch file 

1 0 

9. The computer program of claim 7 further comprising: 

(a) a patch file decoder subroutine which creates an auxiliary 
table from said patch file; and 

(b) a NEW file construction subroutine which combines said 
15 auxiliary table, said OLD file and said patch file for generating said 

NEW file. 

10. The process of forming smaller patch files using the program of 

claim--.7^and_cjQmprising:-. .- 

20 < a ) inputting the OLD file to a string locating subroutine to 

collect a string table; 

(b) inputting the string table into a match subroutine 
provided with the NEW input file to form a match table; 

(c) collecting the matches from the table and non-matching 

2 5 regions from said NEW file to form a list of patches; and 

(d) encoding said patches into a patch file for sending to a 
plurality of remote locations so that the remote locations can use said 
patch apply program to convert OLD files to NEW files. 

3 0 11. A method for controlling a general purpose computer, and 

causing it to compare an OLD binary file and a NEW binary file, to 
construct a patch file containing the differences between the OLD and 
NEW binary files, and subsequently controlling a remote different 
general purpose computer to use said patch file to reconstruct and 
3 5 output said NEW file from said OLD file comprising: 



BNSOOCID: <WO 9904336A1_I_> 



WO 99/04336 



27 



PCT/US98/14433 



(a) a binary file input sequence enabling the OLD and NEW 
files to be accessed repeatedly and progressively;. ., 

(b) a string table subroutine^ forming stored data structures 
comprising a plurality of fixed-length substrings within a selected 

5 portion of said OLD file; 

(c) applying the string table subroutine , to successive 
portions of said OLD file so that the contents of said string table are 
constructed based on selected portions of said OLD file so that the 
OLD file is completely input thereto; + -.. 5 . . 

10 , , , (d) a match table for storing data, structures . describing 
- ^ approximate matches between portions of the NEW file compared to 
the OLD file;. ; : y ; - • ; . , ,, ,, ; - 

t (e) inputting to the match table the said , NEW file partitioned 
into selected length substrings from : the OLD file, processed ^through 
15 the string table wherein said substrings are extended to longer 
approximate matches in either the forward or reverse directions; 

■'.(f)-. cyclically repeating string table preparation and match 
. table preparation wherein repeated, successive portions of said first 

file, are: processed; - - v .-. ; . : • ; 

2 0 ... (g) /optimizing the match table lengths of % the approximate 
matches to produce a patch file of smaller patches; v ^ - wy 

t (h ; ). forming a completed patch file.. for subsequent use;- 

(i) forming a binary computer file including j^ie ^patch, ,file 
frpm the data structures in the match table encoded into a binary 

2 5 computer file and transmitted to a patch file; driven^ different 

, .computer; and ;i ; - ; : : '.:Vv : : 

(j) applying said patch file reconstruction to said OLD file 
with said encoded patch file. - ■ ; ^ , 

3 0 12. The method of claim, 11 wherein the step of forming the match 

table defines a substring approximate match . to : occur when the ratio 
of total mismatched characters to total characters is less than a 
selected ratio, and no adjacent characters having a selected and fixed 
block size contains more mismatches than a fixed and selected value 
3 5 and :said selected ratio and said selected value^ are input to control 
forming the matches. 
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13. The method of claim 11 wherein match table optimization 
proceeds by moving backwards through said NEW file by fixed 
increments, and progressively recording estimates of the number of 

5 characters required to encode said NEW file from that point forward 
to an end point, and adjusting the length of the match to minimize 
said estimate. 

14. The method of claim 11 wherein said string table subroutine 
10 proceeds by placing into said string table data structures necessary 

to locate those substrings of said OLD file which have a fixed size and 
which begin at a location relative to the last selected portion which is 
a multiple of fixed length, and wherein said fixed length and fixed 
increment have no common factors greater than one and said 

1 5 increment is chosen from a plurality of choices. 

15. The method of claim 11 wherein substrings of said OLD file 
comprising repeating patterns of a plurality of sizes are not input 
*n*©-*aid^ s^n=g=--=table by said string table preparation method; and 

2 0 said fixed-length substrings of said NEW file comprising repeating 

patterns of a plurality of sizes are not processed by said match table 
preparation method, and repeating substrings are encoded separately 
into said patch file. 

2 5 16. The method of claim 11 wherein said patch file encoding 

method proceeds by forming patch file records of a plurality of types, 
including: 

(a) copy of records, wherein said completed patch file is 
instructed to append a copy of a region of said OLD file to 

3 0 reconstruction of NEW file, and comprising: 

(i) an identifying code; 

(ii) a location in said OLD file; and 

(iii) a length; 

(b) copy with gap records, wherein said completed patch file 
3 5 is instructed to append a gap to said reconstruction of NEW file, 
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records the position and length of said gap, and then append a copy 
of a region of said OLD file and comprising: 

(i) an identifying code; 

(ii) a gap size; 

5 (iii) a location in said OLD file; arid 

(iv) a length; 

(c) mismatch modification records, wherein said completed 
patch file is instructed to set a current modification position to the 
beginning of said reconstruction of NEW file, and then to modify 
10 certain individual characters in said reconstruction -of NEW file, and . 
comprising: 

(i) an identifying code; 

(ii) a modification count; 

(iii) a plurality of modification identifiers, comprising: 
15 . (1) an increment which is to be added to said 

current modification position of said 

reconstruction of NEW file; 

(2) a difference which is added to the character 
at said modification position in .said 
2 0 ... reconstruction, of NEW file, and c 

,(d) add records,., wherein .said completed., patch file is 
instructed to place indicated characters in previously, recorded.; gaps 
m said reconstruction of NEW file and append any remaining 
characters to said reconstruction of NEW file, and comprising: 

2 5 (i) an identifying code, . 

(ii) a length, and , 

(iii) , characters to be placed is said gaps or appended to 

said reconstruction of NEW file. 

3 0 17. The method and apparatus of claim 11, wherein said patch file - 

encoding method proceeds by forming patch file records of plurality 
of types, including - 

(a) copy records, wherein said patch application method is 
instructed to append a copy of a region of said QLD file to said 
3 5 reconstruction of NEW file, and comprising: 
(i) an identifying code, 



BNSDOCID; <WO 9904336A1_I_> 



WO 99/04336 



PCT/US98/14433, 



(ii) a location in said OLD file, and 

(iii) a length, 

(b) copy with gap records, wherein said completed patch file 
is instructed to append a gap to said reconstruction of NEW file, 
record the position and length of said gap, and then append a copy of 
a -region of said OLD file to said reconstruction of NEW file, and 
comprising: 

(i) an identifying code, 

(ii) a gap size, 

(iii) a location in said OLD file, and 

(iv) a length, 

(c) pattern fill records, wherein said completed patch file is 
instructed to append to said reconstruction of NEW filled with a 
repeating pattern, comprising: 

(i) an identifying code, which also identifies length of 
said pattern, 

(ii) a pattern, and 

(iii) a length, 

4M^&ttem^^^aiiih,^ e a^^ee^d& i wherein said completed 
patch file is instructed to append a gap to reconstruction of NEW file, 
record the position and length of said gap and then append to said 
file a region filled with a repeating pattern, and comprising: 

(i) an identifying code, which also identifies length of 
said pattern, 

(ii) a gap size, 

(iii) a pattern, and 

(e) mismatch modification records, wherein said completed 
patch file is instructed to set a current modification position to the 
beginning of said reconstruction of NEW file, and then to modify 
certain individual characters in said reconstruction of NEW file, and 
comprising: 

(i) an identifying code, 

(ii) as modification count, 

(iii) a plurality of modification identifiers, comprising: 
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( 1 ) an increment which is to be added to said 
current modification position of said 
reconstruction of NEW file, 

(2) a difference which is added to the character 
5 at said modification position in said 

. ... reconstruction of NEW file, 
(f) add records, wherein said completed patch file is 
instructed to place indicated characters in previously recorded gaps 
in said reconstruction of NEW file and appends any .remaining 
10 characters to said reconstruction of NEW file, and comprising: 

(i) an identifying code, 'r- — 

(ii) a length, and 

(iii) characters to be placed in said gaps or appended to 
said reconstruction of NEW file. 

15 f , ^ _ : - • ~ ■ - , « ■ . ... r . , , < , ,,- . • - ■. 

18. The method of claim. 1L including the steps of: . ; . 

(a) forming the string table; : * 

(b) forming the match table; — 

(c) forming the patch file derived from the string and match 
20,. table; and . ., . . . .. ; . : - , 

^ (d) defining, different measures of match in a match -table 
subroutine so that the patch file is minimized in size. ' 

. 19. The method v .of: claim 18 ^including the steps, of progressively 

2 5 building the .match,, table by .repeated; processing of successive* 

positions if the,., NEW file, ^and wherein repeated portions thereof are 
processed .to r completion. ^ :> .< . - < ■ - : >• ; 

.20. The method:. of claim. 1 9, . wherein the; match table encodes 

3 0 match position, number of matching symbols in sequence,- and 

estimated number of symbols to encode the match. 

21. Apparatus,, for controlling a .general purpose computer to 
compare an OLD binary file and a NEW binary file, to construct a 
3 5 patch file containing .the ^differences between said binary files, t and 
subsequently enabling a general purpose providing with said patch 
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file to construct said NEW file from said OLD file and patch file, 
comprising: 

(a) a file input means enabling the OLD and NEW files to be 
accessed repeatedly; 
5 (b) string table storage means storing data structures 

comprising a plurality of fixed-length substrings within a. currently 
selected portion of said OLD file; 

(c) a string table preparation circuit selecting successive 
portions of said OLD file so that the contents of said string table 

1 0 storage means are constructed based on the currently selected 
portioned of said OLD file; 

(d) match table storage means storing data structures 
describing approximate matches between regions of said OLD file and 
regions of ssaid NEW file; 

15 (e) a match table preparation circuit forming from the said 

NEW file fixed-length substrings located in said OLD file by structures 
stored in the string table means, and substrings are extended 
thereby to longer approximate matches in the forward or reverse 
directions;,. 

20 (f) wherein said file input means drives said string table 

preparation circuit and said match table preparation circuit for 
repeated, successive portions of said OLD file, until all portions of said 
file are processed; 

(g) a match table optimization circuit controlling the lengths 

2 5 of the approximate matches in the data structures in the match table 

storage means to produce a minimized patch file; and 

(h) patch file output means for patch file encoding so that the 
data structure in the match table storage means are encoded into a 
binary computer file and transmitted by said patch file output 

30 means. 

22. The apparatus of claim 21 wherein said match table 
preparation circuit defines two substrings to match when the ratio of 
total characters is less than a selected proportion, and a block of 

3 5 adjacent characters contains fewer mismatches than a fixed value. 
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23. The apparatus of claim 21 wherein said match table 
optimization circuit proceeds by moving backwards through said 
NEW file by increments of a fixed size, and progressively makes and 
records estimates of the number of characters required to encode 

5 said NEW file from that . point forward to the end, and adjusts the 
length of the match used to encode said .block to -minimize said 
estimate. . . . .. ■ ■ J ., : L * , \ ; : 

- * • - _ . ■ _ ' . r 

24. The apparatus of claim 21 wherein said string table 

10 preparation circuit processes data structures necessary to locate 
those substrings of said OLD file which have length equal to a fixed 
size and which begin at a location relative to said currently selected 
portion which is a multiple of a fixed increment, and wherein said 
fixed size and fixed increment have no .common factors greater than 

15 one and said increment is selectable from a plurality of choices. , 

25. The apparatus of, claim 21 wherein substrings of said OLD .file 
comprising repeating patterns are not- placed into said string - table 
storage means, and said fixed-length . .substrings of said NEW file 

20. comprising repeating patterns of a plurality of sizes. - .are not 
processed by said .match table preparation circuit, but _ said repeating 
substrings are \ t encoded separately into said patch file. - v - ( . 

26. The apparatus of claim 21 wherein said patch file encoding 
25 proceeds by^forming patch file records of a plurality of types, - 

including: }> .- v ... „. , ~ . . . : , . .\ :r , r 

(a) copy records wherein said patch file output means is 
instructed to append a . copy —of a region of said OLD file to 
reconstruction of NEW file, and comprising: ~ ; v:; ,, l : 

3 0 v „. . (i) an identifying code, , , f . ( 

a location in said OLD file, and.. _ , : 
• (iii) a. length; , , r Vi . « ; , . v , ?f / 

(b) copy . with gap records wherein ( said .patch file -ouput 
means is instructed to append a ,gap to .the reconstruction of NEW file, 

3 5 rec.ord the position and length of said gap, and then copy a region of 
said OLD t file to the reconstruction of said NEW file : , and comprising: 
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(i) an identifying code, 

(ii) a gap size, 

(iii) , a location in said OLD file, and 

(iv) a length; 

5 (c) mismatch modification records, wherein said patch file 

output means is instructed to set a- current modification, position to 
the beginning of said reconstruction of NEW file and then to modify 
certain individual characters in said reconstruction of NEW file, 
comprising: 
1 0 (i) an identifying code, 

(ii) a modification count, 

(iii) a plurality of modification identifiers, comprising: 

(1) an increment which is to be added to said 
current modification position of said 

1 5 reconstruction of NEW file, and 

(2) a difference which is added to the character 
at said modification position in said 
reconstruction of NEW file; 

- — add_ records- ^wJierein said .patch file output means is 

2 0 instructed to place indicated characters in previously recorded gaps 

in said reconstruction of NEW file and appends any remaining 
characters to said reconstruction of NEW file, and comprising: 

(i) an identifying code, 

(ii) size 

25 (iiO characters to be placed in said gaps or appended to 

said reconstruction of NEW file. 

27. A method of processing NEW and OLD files through a patch 
build program in a computer comprising the steps of: 
30 ( a ) building a string table from the OLD file processed from 

the start to the end thereof to compile strings therein; 

(b) using strings in the string table to make matches with 
NEW file wherein the matches are stored in a match table; 

(c) for each match in the match table preparing for each 

3 5 match values of I, L, RL, and S wherein I is match location, L is 

number of consecutive approximately matching characters in the 
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forward direction,. RL is number of consecutive approximately 
matching characters in the reverse direction, and. S is estimated 
number of characters required to encode the match; 

(d) progressively changing the values of L to find the longest 
5 match. 

28. The method of claim 27 wherein the matches in the match 
table are progressively and dynamically changed by: 

(a) adding to L an adjacent chunk of the OLD file; ■ - 
10 , (b) modifying S dependent on the added chunk; 

(c) re-evaluating the value of L after steps > (a), and '(b) to 
obtain new values I, L, RI and S; and 
. (d) optimizing the match table by xe-evaluating matches 

therein so that the entire table is processed- by the repeated steps (a) 
15 and (b) until there are no remaining adjacent ^chunks to process. ; : 

... 29. - The method claim 28, wherein match table; optimization;: is 
dependent on chunk size and including the steps of changing ^ chunk 
size. 

20 ; - , , ^ , . , ' : • . ... , , ■ , ; 

- 30. : , The .method of claim 29 wherein chunk ; size is ^increased by 
doubling the v size. , \ • , .-. v . >• - t . ; 

31. A method for controlling a :; general purpose computer to 
progressively compare an OLD binary file with a NEW binary file, and 

2 5 construct a patch file containing. :the differences .between, the OLD and,; • 

NEW- binary files, and : to enable t subsequent operation of* a remote 
v and different general purpose computer to use said patch file to 
reconstruct and output said NEW file from said OLD file and 
comprising the steps of: 
30 (a) forming a string table of stored data structures - 

comprising a plurality of entries therein; - ^ 

(b) evaluating a selected portion of said OLD file so that the 
contents of said string table are constructed based on selected 
portions of said OLD file; 

3 5 (c) forming a match table for storing data structures 

describing approximate matches between portions of the NEW file 
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compared to the OLD file and inputting to the match table the stored 
data structures; 

(d) cyclically repeating string table and match table 
formation wherein repeated, successive entries into said tables are 

5 optimized in match table lengths to produce a patch file of selected 
sizes; and 

(e) forming a completed patch file for subsequent use from 
the data structures in the match table encoded into a binary 
computer file to be transmitted as a patch file for a remote computer 

1 0 to enable patch file reconstruction of said OLD file with said 
transmitted patch file. 

32. The method of claim 31 including the step of forming the string 
table from the OLD file by evaluating all of the OLD file, forming the 

1 5 match table by inputting all of the NEW file for match table 

formation, and progressively extending match table and string table 
formation wherein initial matches in the match table overlap other 
matches, and extending the process until match overlap ends. 

2 0 33. The method of claim 31 including the step of forming the match 

table initially with a set of stored data structures describing 
approximate matches, and subsequently refining the approximate 
matches in an iterative process. 

2 5 34. The method of claim 31 including the step of forming the match 

table by serially processing the entire NEW file to completion, and 
thereafter optimizing the match table to reduce the size of the patch 
file. 

3 0 35. The method of claim 31 including the step of forming the patch 

file as a data stream stored on a memory media. 

36. The product made by the method of claim 35. 
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37. The product made by the remote computer when provided 
with the patch file of claim 31 and the OLD file and which comprises 
the NEW file data stream. 

5 3 8. The method of claim 31 wherein said string table is formed _by 
locating repeated patterns in said OLD file, and forming the match 
table ' initially with approximate matches wherein the matches are 
progressively refined while reducing the patch,. file size. _ 

10 39. the method of claim 31 including ... the repeated" step of 
evaluating . the OLD file • until the entire OLD file is evaluated, and 
including the step of evaluating the entire NEW file for matches with 
the entire OLD file. - •" 

15 40. The method of claim 39 including the step of completing the 
match table with the NEW file, and optimizing the match table 
overlapping the step of completing the match table. 
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Fig. 2 - Patch Construction 
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Fig. 3 - Patch Application - Preferred 

Embodiment 
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Fig. 5 - Match Table Preparation Flowchart 
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Fig. 7 - Patch File Encoding Flowchart 
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(57) Abstract 

This method and apparatus set out an updating 
system to convert ah OLD input file (34)"into a NEW 
file (40) with a distributed patch file (36). The OLD 
input file (34) is examined to locate strings in it 
which are stored in a table; the-NEW file is tested 
for matches (perfect matches are not mandated) which 
are stored. In an iterative process, matches are refined 
and optimized^ resulting in- a smaller and more easily 
distributed patch -file (36). The patch file (36) can 
be sent to locations where the OLD input file (34) is 
converted by the patch file into a NEW file (40). 
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SYSTEM FOR FINDING DIFFERENCES BETWEEN TWO COMPUTER FILES AND UPDATING THE COMPUTER 
FILES 



5 

BACKGROUND OF THE INVENTION : : 

This invention is directed toward apparatus and methods for 
' updating existing computer files. More particularly, the invention is 
10" directed toward methods : and apparatus for identifying • only the 
'portion of the : file to be changed, forming a patch program to 
" implement the change at the bit level, and forming a Patch Apply 
program to implement the desired changes in the existing file 
thereby forming a revised file, meaning a NEW or updated file. 

15 ' : ' : " ; - J ' ' ] ' ' : \\ ' ' " 

BACKGROUND OF THE ART 
' 1 The changing or "updating" of files is a - normal process in 
computer science. Application files are routinely updated as 
technology advances and * improvements are developed. Application 

2 0 files are also frequently updated to eliminate bugs that are found 
during usage. - Data files are typically modified as new data are 
acquired, or old data are determined to be invalid. Files comprising 
text are also routinely modified vr f or numerous reasons well known in 
the art. , ... 

2 5 Traditionally, changes: in files have been implemented by 

creating a completely NEW file containing all desired changes, and 
distributing this full, modified file to all users to be downloaded to 
replace the existing file. Distribution of a completely new version of 
the file is costly to the provider? Reinstallation of the new version is 

3 0 costly to the end users. Often, the initial version of the file is lost, 

and cannot be retrieved for economic or technical reasons. 
Reinstallation is bandwidth or "media intensive for the end user. 

File "patching" • techniques have been used in the prior art to 
revise existing files. the administration of these techniques is 
3 5 complex, and installation of the patch is often as involved technically 
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2 ' 

and economically as the installation of a fully revised version of the 
file. 

In view of the previously described methodology and prior art, 
an object of this invention is to provide a system to modify or patch 
5 an existing file at the bit level and only in those areas of the file 
requiring modification. 

Another object of the invention is to provide a system for 
easily building a patch, and easily installing the patch. 

Yet another object of the invention is to provide a system for 
1 0 finding the difference between an existing computer file and an 
updated version of the file using a fixed amount of memory, and 
using these differences to implement the desired update of the 
existing file. 

Another object of the invention is to provide a patch system 

1 5 which works with a single file, with a group of files, with directories, 

and with directory trees. 

Still another object of the present invention is to provide a 
patch system for an existing file which can be used to fix bugs, 
^--^mf^lemen^ program changes^ -im^plemen^ data and texture visions 

2 0 while providing an error checking system and a backup system 

which allows the user to restore the original existing file. 

Yet another object of the invention is to provide a system with 
which multiple changes or patches can be made in an existing file 
with a single application of the system. 

2 5 There are other objects and advantages of the present 

invention which will become apparent in the following disclosure. 

SUMMARY OF THE INVENTION 

The first step in the computer file update or patch process 

3 0 involves the building of a Patch File. The existing or original file, 

which will be referred to as the OLD file, and the revised file, which 
will be referred as the NEW file, are input into a Patch Build program. 
The differences in the OLD file and the NEW file are determined by 
the Patch Build program, and this information is output by the Patch 
3 5 Build program as a Patch File. Operation of the Patch Build program 
will be disclosed in detail in a subsequent section. The changes 
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required to convert the OLD input file to the NEW input file are 
transferred as the Patch File. 

The next step in the computer file patch involves the 
distribution of the Patch File, along with a Patch Apply program, to 
5 the end users so. that the OLD file, (wherever located) is efficiently 
; converted to the desired, updated NEW file. The OLD file and the 
Patch File are input by the, end user into the Patch Apply program. 
The Patch Apply program, changes, at the bit level, only the portions 
of the OLD file required to yield the desired file update. The updated 
10 file is therefore output from the Patch Apply program yielding the 
desired NEW file at the user level. 

By distributing only the Patch File .and Patch Apply program to 
the end users, the desired file update can be implemented by the end 
user with . - maximum operational; and: economic efficiency. 
1 5 . Furthermore, the update , is implemented with numerous safety: 
features including (1) automatic verification that the correct files 
have been used and that the patches have been built and -applied 
correctly, (2) automatic check for sufficient disk space, (3) restart 
- capability after power failure, (4) backup of any files affected by a 

2 0 patch; and (5) the ability to reverse the patched file or an, entire 

~ system to the prior condition. 

BRIEF DESCRIPTION OF THE DRAWINGS 

, . So that the manner in . which the above recited .features, 
25 - advantages and objectives of the present invention are attained and ; 
can be understood in detail, more , particular description of the 
invention, briefly, summarized above, may be had by reference to the 
appended drawings. It .^.is noted, : however, that the : appended 
drawings, illustrate only- typical embodiments of the invention, and 

3 0 therefor not. to be considered limiting in scope, for the invention may 

admit to. other equally effective embodiments. , ~ 

Figs, la and lb disclose a conceptual overview of the invention; 
Figs. 2 is a flow chart of the patch build program used to .create 

the Patch File; . . v * . ' , ' 

35 Fig. 3 is a patch apply flow chart forming a. NEW file; 

Fig. 4 illustrates a string table in the preferred embodiment; 
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Fig. 5 is a flow chart of the preparation of the string Table; 
Fig. 6 is a match table optimization flow chart; and 
Fig. 7 is a patch file encoding flow chart. 

5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Figs. la. and lb present a conceptual overview of the invention. 
For purposes of discussion, it will be assumed that software updates 
or revisions are distributed (by mailing a disc, by downloading or 
otherwise) to multiple end users from a central organization location. 

10 As an example, a software company may formulate an updated file 
version, and create the associated Patch File (along with the Patch 
Apply program), at its headquarters facility. This will be referred to 
as the "central" facility. Copies of the Patch File and Patch Apply 
program then are distributed to the company's customer user base so 

1 5 that the customers can update their existing files at their locations. 
These will be referred to as "end user locations". The NEW file used 
to create the Patch File at the central facility will sometimes be 
referred to as the NEW "template" file in order to distinguish this 
- --<^py-0^ 

2 0 at remote locations by conversion of remotely distributed OLD files. 

Fig. la conceptually illustrates activities at the central facility. 
The original file, which is referred to as the OLD input file 20, is input 
into a Patch Build program 23. The revised file, which is referred to 
as the NEW input file 22, is also input into PATCH Build program 23. 

2 5 The differences in the OLD input file 20 and the NEW input file 22 

are determined by the Patch Build program 23 as will be detailed in 
subsequent sections of this disclosure. The output of the Patch Build 
program is a Patch File 36. The Patch File 36 contains only the 
changes required to convert the OLD input file 20 to the desired NEW 

3 0 input file 22, and is readily distributed to end user locations. 

Fig. lb conceptually illustrates activities at one of typically a 
plurality of end user locations. The Patch File 36, along with the end 
users version of the OLD input file 34, is input into a Patch Apply 
program 31. While the OLD files are identical, the numeral 20 is used 
3 5 to designate this file copy at the central location and the numeral 34 
is used to designate the duplicate copies at the end user locations. 
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The Patch Apply program .31 is typically created at the central 
facility and distributed to the <end user locations. As illustrated in 
Fig. lb, it converts the end users OLD file 34, with the help of the 
Patch File 36 into the desired, updated NEW file 40 at the end user's 
5 facility. The Patch Apply program 31 changes, at the bit level, only 
the portions of the OLD file 34 required to yield the desired file 
update as will be described in detail in a subsequent section . of this 
disclosure. The resulting NEW file 40 is, therefore, output from the 
Patch Apply program 31 and is ; . the same as the NEW, file 22 at the 
10 central location, but it is identified with .the numeral 40 to emphasize 
that the conversion is made at the end user's facility. 

Fig. 2 illustrates the patch construction, process in more detail. 
Elements of the Patch Build program 23 are -enclosed within the 
broken line boundary as indicated. The OLD input .file, 20 is input to 
15 a string table coding circuit 12 forming a string table output to a 
matching circuit 14. . Both circuits are explained in detail below. The 

match table circuit has an input of a . the entire ' NEW >; input file- ^.Jn' 

conjunction with the stored string table 24, the match table circuit 
.produces the match table. 26, which is retained in a suitably sized 
2 0, match table storage means, a conventional memory module. The 
. operations of string table circuit coding and match ; table coding 
-alternate until all of the OLD input file 20 has been processed by the 
> string table circuit 12,. and the stored string table 24 has. ...been 
processed into the match table circuit .14. The stored ; match table 

2 5. data -is output to. the match table optimization circuit 16 to alter it. - 

When, this process is completed, the OLD input file 20, the NEW File 
: 22 and the match table -memory .26 are. input tp a patch file, encoding 
circuit 18 which serially passes the completed patch file .36 to a patch 
, file output means 28. . ^ - . - • .-..=■ - 

3 0 ; ,. :• Fig. 3 illustrates the NEW file - generation process at typical 

multiple end user locations. Elements of the patch, apply program 31 
are enclosed within the broken line boundary as indicated.- The 
patch file 36 is taken from the input means , to a patch _ file decoder 
30,,. which . produces an auxiliary table 38; retained_ in the ' auxiliary 
3 5 ; table memory.. Concurrent with, this process, a. NEW file construction 
-converter circuit 32 reads serially the portions of the auxiliary .table 
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38 from memory while it is being constructed, as well as the input 
patch file 36 and the OLD input file 34, and then serially sends the 
NEW file 40 to an output means 41. 

Referring now to Fig. 4 and Fig. 2, the string table 24 is a 
5 modified 8-way B-tree containing a plurality of nodes of five 
different types. As shown in Fig. 4, a root node 42, only one is 
needed, contains up to seven keys (labeled Kl through K7) and up to 
eight child pointers (labeled CI through C8) which identify an 
interior node 44 or a leaf node 46, and an empty "parent 1 ' node 43, 

1 0 all in the root node 42. Each interior node 44 (there may be none or 

a plurality), contains the same information as the root node 42, but 
with the addition of a parent pointer. The unique root node 42 and 
interior node 44 contain a child pointer identifying the interior node 
44 in question. Each leaf node 46 (there may be none or several) is 

1 5 identified by its location at a fixed distance (measured by the 
number of child pointers that must be traversed to it) from the root 
node and contains up to seven keys (again labeled Kl through K7) 
and corresponding list pointers (labeled 11 through 17), which each 

— - ^d«rtTfy==s^^ terminator - -50; -also ' the same parent 

2 0 information as the interior node and an additional field denoted LP (a 

list entry 48 or list terminator 50) corresponds to one of the keys in 
one of the parents of the leaf node 46. The method for associating 
each key in an internal node 44 or root node 42 with an LP field in a 
leaf node 46 is described in the subsequent section entitled 

2 5 "operation". Each list entry 48, of which there may be none or a 

plurality, contains an instance field 49' (labeled i), which identifies a 
location of the corresponding key in the OLD file, and a pointer field 
49 (labeled P), which identifies a list entry 48 or list terminator 50 
corresponding to the next instance of the same key. Each list 

3 0 terminator 50, of which there may be none or a plurality, contains 

the same instance field as a list entry, but is identified by an empty 
pointer 50' (labeled x) indicating the end of the list of instances of 
the corresponding key. 

The match table 26 (shown in Fig. 2) is a rectangular array with 
3 5 four columns and rows equal to the number of "chunks" in the NEW 
input file 22, where a "chunk" is a fixed-size portion of the file 
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(required to be a power of 2) and is an operational parameter of the 
overall patch file construction process illustrated in Fig. 2. Table 1 
illustrates a match table. 

5 TABLE 1 : , 



Match table 



II .... 


Li 


RL] 


Si. 


12 • 


L 2 . 


RL2 


s 2 


13 ; 


L3 


RL 3 


S 3 


1.4: i 

• 


- U , 


RL 4 

• 




-Ix-- 




RL X 


S x 



The four columns labeled I, : L, and S (and all encode parameters) 
•relate, to a possible match for this "chunk" in the OLD Input file .20. 
The L column identifies the number of consecutive characters in the 
,,OLD Input: file 20. starting at the position identified by the I column 
2 0 ithat match (at least approximately) the characters starting at . the 

- beginning of -:the "chunk" in, , the. NEW Input iile 22. - The RL column 
identifies the number of consecutive characters in the OLD Input file 
,20 before, the position identified, in the I column that match, (at least 

• , approximately) the characters before, this, "chunk" t in the NEW- Input 

2 5 file 22. If no match has been located for this "chunk" then the L and 

RL columns both contain 0. If this "chunk" has been identified as a 
candidate ./or., special handling described in the ' -subsequent 
"Operation".,; then the RL column will contain one of several possible 

- . special . handling markers (also described in a subsequent : section). 

3 0 .The S column contains different values at different /times. During, the 

match, table preparation shown in Fig. 2, the S column contains an 
estimate -of the,, number of characters required to encode this 
approximate match, together - with the mismatches that occur within 
it. During the; match table Optimization process , 16 shown in Fig.. 2, 
3 5 the S column is gradually converted to an estimate of. the total 
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number of characters required to encode the patch file from this 
"chunk" to the end. 

Fig. 8 of the drawings shows in broad sweep the operation of 
5 the match table preparation. Referring to table 1 given above, Fig. 8 
includes a first set of prospective or tentative matches. At the top. of 
Fig, 8, assume that approximate matches have been indicated in the 
series of matches which are Lj...L x . These matches may overlap, and 
Fig, 8 intentionally shows such an overlap. This occurs at a 
1 0 preliminary state of affairs. This occurs before optimization. 

As shown in Fig. 8, the preliminary stage may well include 
several tentative matches in adjacent blocks. Without regard to the 
blocks, without regard to the length as measured looking to the left 

1 5 or looking to the right, without regard to the values of L\ or the value 

of RLj, there m ay be an overlap at an early stage of approximation. 
At the end however the approximation is terminated so that the 
adjacent values of L k and also L k +i are adjusted so that overlap is 
_.^^Ha©v^dT«-«3i©.--- pr=ov=i4e^hi^gpaphdG-aHy-' so— that an understanding can 

2 0 be obtained, Fig. 8 shows a representative file and also shows the 

overlap of adjacent blocks. At the conclusion of the optimization 
process, the overlap is eliminated. In the upper part of fig. 8, the 
brackets are shown with an overlap while the optimization process 
removes the overlap so that the entries in the match table L k and 

2 5 Lk+i no longer overlap. 

Based on the foregoing it would then be observed that the 
operation of the optimization process ultimately continues until the 
position identified in the I column builds into the entire file has been 

3 0 handled and matches are now evaluated and are optimized. Since 

precise definition then prevails at the end of the process, the content 
of Table 1 is simplified. Effectively, of Table 1 is reduced to a series 
of entries which still includes the I column. The aggregate length of a 
given entry will then be different, and the S column will likewise be 
3 5 modified, all is explained below. 
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Fig. 5 is a flowchart that describes the match table preparation 
14. This process (as shown in Fig. 2) uses a completed string table 24 
and a NEW input file 22 as inputs to form a prepared (or updated) 
match table 26 output. The method proceeds by passing the NEW 
5 input file 22 backward, a "chunk" at a time. Recall that a "chunk" is a 
- fixed-size portion of the file (size is a power of 2) and is an 
operational parameter of the overall patch file construction process. 
The match table is initially set to "0" : at step 60 and a set pointer is 
set to the last "chunk" of NEW input data at step 62. As each "chunk" 
10 is processed, it is checked at step 64 to see if the string of input 
"chunk" has been exhausted. If exhausted, the process is stopped at 
.. . step 65. The next step in the method is to check this "chunk" for 
special handling at step 66. If special handling is warranted, this is 
marked in the match table at step 68 and the entire contiguous set of 

1 5 - "chunks" to which, the special handling applies is marked and .skipped 

at step 72. This step will be expanded in the subsequent section 
entitled ."Operation ,, , ; If special handling is not warranted, the. .string 
table 24 is consulted to locate ;all the instances in the current portion 
of the -OLD Input file 20 that exactly matches the current "chunk" of 

2 0 the NEW Input file 22. Each instance is examined .at step 70 and 

extended forward and backward to- locate the, largest, current 
^ (possibly approximate) match, where "largest" means the lcmgest 
-match . in the . forward direction with the length in the, reverse 
direction used tty break tips. The largest current match is then 
25,. compared to the match for this "chunk" already marked in the match 
table (from processing previous portions of the OLD -Input file 20). If 
the largest current match is longer in the forward . direction than the 
previously marked match or has the same forward length (plus a 
longer reverse length), the largest current match is marked in the 

3 0-. match table and, all "chunks", contained within the extent of the 

reverse portion of this match are .marked and skipped at step 72. If 
there , is no current match found, or if the largest xurrent match is 
. smaller than - the , previously, marked match, then the match table is 
not altered for this, "chunk." In this case, if approximate matching is 
3 5 being used (or_ if there is no previously marked match for this 
"chunk"), the NEW Input file 22 is moved to the next "chunk", in the 
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reverse direction, and again checked at step 64 to see if the input list 
is exhausted. If exact matching is being used, then all "chunks" 
contained within the extent of the reverse portion of the previously 
marked match are skipped. This entire process is repeated until the 
5 NEW input file 22 is exhausted as determined at stop 64. That is, the 
input file is exhausted when the first "chunk" of the file has been 
processed. 

The match table optimization process 16 is shown as a whole in 
Fig. 2. Details of the steps in match table optimization are shown in 
1 0 Fig. 6. This process operates entirely on the match table 26 without 
reference to any other data structures. The method begins by setting 
the current size to zero at step 80, and then proceeds by passing the 
match table 26 "backward" beginning with the row corresponding to 
the last "chunk" in the NEW input file 22. If a row corresponds to a 

1 5 special handling "chunk" as determined at step 90, then it is skipped. 

The value of L is checked at step 86. If a row has an L value of zero, 
thereby indicating that no match was ever located, then the "chunk" 
size is added to the current size and S of this row is set to that value 
^^^at^step^S^Sr^^ this caser^the next^row is then^ examined- in' the^reverse 

2 0 direction, checking at step 82 to see if the entire table has been 

processed. However, if the value of L for this row is nonzero, then 
the rows corresponding to "chunks" within the forward extent of the 
match described in this row are examined at step 94, and the one 
with the smallest value of S is located. The current size variable is 

2 5 set to the sum of the smallest S value located and the S value of the 

current row. S of the current row is set to the current size at step 94. 
L of the current row is adjusted downward at step 94 to stop the 
current match at the beginning of the "chunk" corresponding to the 
smallest S value located. All rows within the reverse extent of the 

3 0 current match are then skipped, after setting at step 96 all of their S 

values to the current size, and adjusting at step 98 their L values to 
reflect the new end of the current match. This entire process is 
repeated until the match table 26 is exhausted, as determined at step 
82, and then stopped at step 84. That is, the entire table has been 
3 5 processed when the first row of the table has been processed. 
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Fig. 7 is a flowchart which describes, in detail, the patch file 
encoding method 1 8 illustrated in Fig. 2. This process, takes the 
match table 26, OLD input file 20 and NEW input file 22 as inputs and 
produces the output patch file 36 as output. The method", proceeds by 
5 first setting a pointer at step 100 and then passing the match table 
26 in the forward_direction by beginning at the row corresponding t to 
the first "chunk" in the NEW input file 22, and examining each row. in 
turn until that .table is, exhausted as indicated at step 102. As the 
7 various patch, file records, are , processed ,. they are placed in a 
1 0 temporary storage means to be passed on at step 114 to the post- 
processing encoder at the end of the process,, which is terminated at 
step 116. If . the, row is .marked for special handling as determined at 
. . step 104, then a corresponding "special , handling" recprd is generated. 
., This may require examination of this region of the .NEW , input file. ,22. 
15 The .immediately.. suh:sequent_-rpws 4 >liat correspond., to "chunks'' . . 
within the. extent of the region covered by the special handling are 
skipped at step 110. : If the row is not marked, for special handling, 
[ . but has an L value of zero as determined at step 106, then an "Add" 
record is generated at step 1.1 2, which requires examination of the 

2 0 - corresponding region of " the NEW input file . 22., All immediately 

subsequent rows, that also have zero L values and that are,, not 
marked, for .special handling,, are skipped. .If, hpwever^Jhe current 
row., is * not Irnariced for special^ a nonzero L value, 

, Sen both the, OLD input file ^20 and the^ .NEW. .input- f tie .22 are 
25 examined to locate any mismatches in this (possibly approximate) 
matched region. , In, this case,, ."Qopy" . records >nd possible "mod" 
, records are generated at step,. 108, and all immediately subsequent 
1, rows that correspond to , "chunks" ; . within the extent of -this matclr are 
skipped. This entire process is .repeated. by .-returning. to the step 102 

3 0 until the match, table 26 is exhausted by. the processing of the last 

row of the table, as indicated by the check at the step 102. The patch 
. .file records in the. temporary, storage means are then passed to the 
" post processing encoder at . step 1H for , -final, optimization , and 
encoding.. . A- preferred embodiment of this encoder , is discussed in 
3 5 "7 the "operation" section below. ' The post processing encoder .. then 
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passes the completed output patch file 36 to the output means 28 
shown in Fig. 2. 

OPERATION 

5 The following discussion describes the preferred operation of 

the invention. It should be understood, however, that , there are 
alternate modes of operation which will yield desired results. There 
are a number of design parameters for the operation method that 
will impact the discussion of the method in various ways. These 
1 0 parameters will be defined, and the values for these parameters that 
are used in the preferred operational embodiment will be given. 

1. "Chunk" size: This parameter controls the minimum match size 
as well as the granularity of the match table 26 and must be a power 
of 2. Increasing this value allows one to increase efficiency at the 

1 5 expense of effectiveness. The preferred embodiment is 8, but values 

of 4, 16, etc. are certainly reasonable. 

2. Speed factor : This parameter controls the granularity of the 
string table 24. Increasing this value allows one to increase 

— -^e#f ie ie ney^ at the expense of effectiveness: The preferred 

2 0 embodiment uses user-selectable relative values between 0 and 10 

inclusive. 

3. Approximate-matching tolerance : These parameters control 
how closely two strings must coincide to be considered a "match." 
Altering these tolerances allows one to tailor the method for specific 

2 5 patterns of changes with different kinds of data. Two tolerances 

used in the preferred embodiment comprise a local tolerance and a 
global tolerance, where both are expressed as fractions. A local 
tolerance of 8/6 specifies that out of each 16 adjacent characters, 8 
would be allowed to mismatch before the match was terminated. A 

3 0 global tolerance of 1/3 specifies that the total number of mismatches 

can be at most 1/3 of the total characters in the match. The 
preferred embodiment uses user-selectable pair values of local and 
global tolerances, respectively, of (8/16, 1/3) which are generally 
used with executable files, and (0/16, 0/1), which is generally used 
3 5 with text files. Note that the latter tolerance pair requires exact (as 
opposed to approximate) matching. 
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4. Chain length limits : This parameter controls the maximum 
number of identical key values that will be chained together in the 
string table 24. It is necessary to limit this for efficiency reasons. If 

' no limit is placed on the length of a chain, then a file consisting 
5 .entirely of repeating patterns degrades the efficiency of the match 
table preparation, process (see Fig. 2) without helping its 
effectiveness. If the value for this limit is low, effectiveness suffers, 
. since some potentially useful matches will not be found. The 
preferred .embodiment uses a fixed value of 20 for this parameter, 
10 but values between 10 and 100 are certainly . reasonable. Altering 
this parameter based on available memory size is also a potentially 
helpful ramification of the preferred embodiment. • 

5. Minimum special handling size : This parameter* controls special 
handling block size to be eligible for ; special handling described 

15 below. The preferred embodiment uses a fixed size of 12 characters - • 
for this parameter. ; > = _ : r .c ; : ' ' : 

6. Coding estimation parameters : These parameters :- contrql s -,the 
estimates, used in the match table preparation and match table 
optimization, of the number of characters that will be taken to 

2 0 encode a given match. The preferred embodiment uses a. fixed ; 
• estimate - of 6 L characters plus 3 characters per mismatch in . the 
forward direction. -. ; • ' . ' v . ::■ r ; . : ; • 

Having described a representative set of operating parameters 
of the method shown in Fig. 2, the preferred, embodiment^ of the 

2 5 patch file construction method and apparatus will be described in 

detail. The OLD input file 20 and NEW input file 22 are input by 
; : means ..of memory mapped files, .. which facilitates ' the particular 
access patterns used by., the, method. ; Namely, .the method needs to 
access rapidly and repeatedly the current block of the OLD input file, 

3 0 7 while the, NEW Input file is passed sequentially backward or_ forward. 

The match' table 26 is also passed sequentially backward or forward 
so that it is also stored in a memory mapped file in mass storage. 
The string table 24 is stored in random access memory (RAM) and 
the RAM size determines how much of the OLD, "Input .file may be 
3 5 processed at a time. ... The temporary patch file storage used, for - 
buffering the patch file before it is. passed to the post processing 
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encoder 114, as shown in Fig. 7, is a dedicated RAM with overflow to 
mass storage, since there is not any a priori limit on the size of this 
temporary patch file storage. The output patch file 28 is an ordinary 
sequential access file since this is adequate for the method. 
5 The patch file construction method begins by acquiring the 

largest available memory block for use in storing the string table 25. 
The largest size of OLD Input file that can be processed into a string 
table of this size is then calculated. This becomes the size of the 
block of OLD input file that will be processed on each pass through 

1 0 the string table preparation and match table preparation shown in 

Fig. 2. 

The string table preparation 12 depends greatly upon the 
details of the particular embodiment of the string table 24 that is 
selected. In the preferred embodiment, the string table 24 consists 
15 of a modified 8-way B-tree with linked list leaves as described 
above. The string table preparation process 12 is executed by then 
consists of the following steps: 

™ — ^Sieg^t?—^ the 

2 0 portion to process, and initialize the B-tree to "empty." 

Step 2. If the key (string of the same length as a "chunk") 
present at the input pointer is eligible for special handling, skip it 
and go to step 6. 

Step 3 . Lookup the key in the B-tree. 

2 5 Step 4. If the key is not in the B-tree, insert it and begin an 

associated instance list. 

Step 5. If the key is present in the B-tree and the associated 
instance list has not reached the maximum chain length, add this 
instance to the list. 

3 0 Step 6. Add twice the speed factor plus one to the input pointer. 

If there is still a key remaining in the portion to process, go back to 
< step 2. 

"Special Handling" in the preferred embodiment means that the 
3 5 key comprises either 8 repetitions of a single character, four 
repetitions of a two-character pattern or 2 repetitions of a four- 
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character pattern. If this pattern continues for a total of at least 12 
characters, then the block is. eligible for special handling. Potentially 
helpful ramifications often include the addition of other types of 
special handling blocks. 
5 Skipping by odd amounts in step 6 (when the "chunk" size is a 

power of .2) insures that a sufficiently long match will always be 
found, even , though not every "chunk" in the OLD input file 20 is 
tabulated in the string, table 24. As an example, if the speed factor is 
equal to 10, then only the keys whose positions are a multiple of 21 

10 will be placed in the string table. .. If, however, an. exact match 
between OLD and NEW files has a length of at least 91 characters, 
then this match will be located because at least one of the NEW 
"chunks" included in the match will exactly match a key from the 
OLD file placed in the string table. 

15 The lookup and insertion procedures mentioned .above are 

standard operations on a B-Tree as described, for example, in Knuth, 
,. The Art of Computer Programming,. Vol. 3, and are hereby 
incorporated into this disclosure by reference. The method for 
associating a key value with an instance list pointer in the preferred 

2 0 embodiment ; : of : the string table .24 will, however, be described. 
.. Referring to Fig. 4, keys are located in three different types of nodes 

- in the string table which are the root node 42, interior nodes 44 and 
leaf nodes 46. . If. a key is in a leaf node 46, it. is clear where the 

— corresponding instance list pointer is located in the corresponding L 

2 5. field- of the .same. :node._. However, the . keys in the root and interior 

nodes also have associated instance lists that are panted to by the LP 
.. , fields -of various leaf nodes. The important fact to realize here is 
that there are- always , more leaf nodes than there are .keys , in all root 
and interior . nodes combined.. Specifically, one can begin at any key 
30, in a root or interior _node and arrive at. a unique leaf node by the 
following process:, (1) proceed down the tree by moving "to the right" 
,at the first, stage by using the C field whose index is one greater than 
- the index of the key, and. (2),, then taking CI pointers until a leaf is 
reached. The LP field of this leaf node points to the instance list 

3 5 associated, with the original key. 
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The match table preparation process was described above in 
the discussion of Fig. 5. To that description, details will be directed 
toward how to mark "chunks" in the match table 26 for special ' 
' handling and how to mark a match in the match table. As mentioned 
5 above, special handling is indicated in the match table by certain 
reserved values in the RL field. The preferred embodiment reserves 
values above FFFFFF00 in hexadecimal to indicate various types of 
special handling: FFFFFF01, FFFFFF02, FFFFFF03 are used to indicate 
repeated characters, repeated two-character patterns and repeated 

1 0 four-character patterns respectively. The method allows other types 

of special handling, 

To mark a match in the match table, the I field is set to the 
position in the OLD input file 20 that matches this "chunk," the L field 
is set to the maximum extent of the match in the forward direction 

1 5 (including the size of the base "chunk"), the RL field is set to the 
maximum extent of the match in the backward direction (not 
including the size of the base "chunk") and the S field is set to reflect 
the coding estimate derived from the number of mismatches in the 

— for=waFd-=ex4ension- (6 -plus- -3 - times—the -number - of mi sm a tehes" in the 

2 0 preferred embodiment). Marking the match is completed by setting 

I of the previous row to I minus the "chunk" size, L of the previous 
row to L plus the "chunk" size, RL of the previous row to RL minus 
the "chunk" size, and S of the previous row to S. This process is 
repeated as long as RL is larger than the "chunk" size. The match 

2 5 table optimization 16 process was thoroughly described above in the 

discussion of Fig. 6. 

The patch file encoding process 18 was described above in the 
description of Fig. 7. Details are now added concerning the encoding 
of the temporary patch file storage which is passed to the post 

3 0 processing encoder, as well as details of the preferred embodiment of 

the post processing encoder itself. The temporary patch file 
comprises a rectangular four-column array, with each row 
corresponding to a patch file record and comprising a code, a NEW 
file position, a length and a modifier. A code describes the record 
3 5 type and is one of the following values: 0 for "Copy," 1 for "Add," 2 
for "Mod," 3 for "Special Handling 1," 4 for "Special Handling 2," 5 for 
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"Special Handling 4." The NEW file position .indicates, the position in 
the .NEW file to which this record pertains, and the length field 
indicates its length. . The meaning of the modifier varies according to 
the. value in the code . field: for "Copy'.' records, it indicates the OLD 
5 file position from which the block is to be copied; for "Add" records it ■ 
.is unused; for "Mod" records it indicates, the difference^etween. the 
NEW. file, contents at that position and the OLD file contents that will 
have been copied into that position during patch application; for 
"Special- Handling" records it , indicates, the .. 1-, 2-, or 4-character 

10 pattern that .will be repeated through that .region. This encoding 
(with minor variation) is also used for the auxiliary table 38 in the 
patch application process shown in Fig. 3. ... . , 

Various optimizations are possible in the post processing 
encoder (See Fig. 7). The preferred embodiment performs the 

15 following steps: , ; / .: ;. 

Step 1 . Sort all "Copy" and "Special Handling" records in 
increasing order by NEW file position. . : .,- -.v.. 

Step 2 , Add 10 to the Code field of any of these recprds that, are 
2 0 not immediately preceded by a "Copy" or "Special Handling" record 
(these new codes represent "Copy With Gap," "Special Handling 1 
With Gap," etc.) 

Step 3 . Form an array, which consists of packed forms of these 
records (specific formats are given below). 

2 5 Step 4 . Add to this array another record, which is a single record 

containing all "Mod" records in packed form, sorted in order by NEW 
File Position (specific format given below). 

Step 5 . Add to this array another record, which is a single record 
containing the actual characters from all of "Add" records, sorted in 

3 0 order by NEW File Position (specific format given below). 

Step 6. Add to this array another record marking the end of the 
patch file (specific format given below). ^ 

Step 7 . Pass this entire array to any general purpose data 
compression routine. 

35 
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The preferred formats of all packed records are as follows. A 
"varindex" field is a variable-length encoding of a 32-bit unsigned 
integer (that is, an integer between 0 and 4,294,967,295 inclusive) in 
which one character is used for magnitudes between 0 and 127 
5 inclusive, two characters are used for magnitudes between 128 and 

16.383 inclusive, three characters are used for magnitudes between 

16.384 and 2,097,151 inclusive, four characters are used for 
magnitudes between 2,097,152 and 268,435,455 inclusive, and five 
characters are used for magnitudes between 268,435,456 and 

1 0 4,294,967,295 inclusive. Specific formats are as follows: 

Format for "End Of File": 
Code (16) 

1 5 Format for "Copy": 

Code (0); OLD Position (varindex); length (varindex) 

Format for "Copy With Gap": 

----- Code-(IO); Gap Size (varindex); OLD Position (varindex); length 
20 (varindex) 
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Format for "Special Handling 1": 
. .Code (3); Pattern (character); length (varindex) 

Format for "Special Handling 1 With Gap": 
5 Code (13); Gap Size (varindex); Pattern (character); 

length (varindex) 

Format for "Special Handling 2": : ... 

Code (4); Pattern (two characters); length (varindex) 

10- : - , -' '• • ■■ - •' ^ ■ 

Format for "Special Handling 2 With Gap": . . u . ,■ 

Code (14); Gap Size (varindex); Pattern (two characters); length 
(varindex) 

15 Format for "Special Handling 4".:- : . .... 

Code (5); Pattern (four characters); length; (varindex) 

Format for "Special Handling 4 With Gap": 

.'. Code (15); Gap .Size (varindex); Pattern (four characters); length 
20 ' - ; (varindex) ;":>" : ; ... ' : . : . . .•. : 

Format for "Add".: : v - -z\ : , ; ; , 

'. . Code (1); Total :Length (varindex); Characters , . _ .,. , .„ ... 
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Format for "Mod": 

Code (2); Total Number of Mods (varindex); Diffl (character); 
Pos. Incl (varindex); Diff2 (character; Pos. Inc2 (varindex; ... 

Diffx 

5 (character); Pos. Incx (varindex) 

In the last format, the "Diff" fields are the various amounts that must 
be added to a position to fix up a mismatch, and the "Pos. Inc" fields 
are the distances (position increments) between successive 

1 0 mismatches (Pos. Incl is the position of the first mismatch, Pos. Inc2 

is the distance between the first and second mismatches, etc.). 
More details on the "varindex" format follow: 

the 32 data bits of the quantity to be encoded will be 
15 denoted XO XI X2 ... X31 (from least significant to most significant); 

quantities encoded in one character set X7 through X31 to 

zero; 

quantities encoded in two characters set X14 through X31 
to zeror-and - at least one of X7- through X13 to nonzem;_quantities 

2 0 encoded in three characters set X21 through X31 to zero and at least 

one of X14 through X20 to nonzero; and 

quantities encoded in four characters set X28 through 
X31 to zero and at least one of X21 through X27 to nonzero; 
quantities encoded in five characters set at least one of X28 through 
2 5 X31 to nonzero. 



In the encoding to follow, successive characters in the encoding are 
delimited by a colon and, within a character, the bits are written 
from most significant to least significant. Then the encodings are as 
3 0 follows: 

One character: 0 X6 X5 X4 X3 X2 XI XO 

Two characters: 10 X13 X12 XI 1 X10 X9 X8 : X7 X6 X5 X4 X3 X2 XI 
XO 

3 5 Three characters: 110 X20 X19 X18 X17 X16 : X15 X14 X13 X12 XI 1 
X10 
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X9 X8 : X7 X6 X5 X4 X3 X2 XI XO 
1110 X27 X26 X25 X24 : X23 X22 X21 X20 X19 XI 8 
X17 X16 : X15 X14 X13 X12 XI J X10 X9 X8 : 
X6X5 X4X3 X2.X1 XO 
11110000 : X31 X30 X29.X28 X27 X26 X25 X24 : 

. X22 X21 X20 X19 X18 X17 X16 : X15 X14 X13 
. , X12 X11 X10 X9 X8 : .X7 X6 X5 X4 X3 X2 XI XO 

10 Patch File -Application ., ■ : -._ . 

. The preferred embodiment of the patch file application method 
and apparatus will now be described in more detail. According to 
Figs, la, lb and 3, the OLD input file 34 is input by means of a 
memory mapped file to facilitate the access patterns of the_ method, 

15 while the injut patch file : 36 is input via a normal sequential, access 
file/ The auxiliary table 38 is stored in RAM; with overflow to mass 
.storage, since- .there is no a priori . limit on the size of this table. . The 
Output NEW -file 40 is built using a memory mapped file, to facilitate 
the random access patterns needed. . : ;/ -y. : ^ , 

2 0 Referring to Fig. 3, the patch file decoder 30 essentially 

accomplishes .the reverse- of the post processing' encoder portion of 
.the patch file encoder . 18... That is, the pattern file decoder .30 
. decompresses the input patch; file 36 and decodes it into the auxiliary 
ittable 38-, ; which is formatted, in- the ■.preferred., embodiment, -exactly 

2 5 ••^nke^;the•/ v te^lpojary:.patch file ; , storage ;f of :■ the, patch -file encoder.,18. 

.More- specifically,' the auxiliary . table ,38 comprise*, a -rectangular 
four-column array, with each row corresponding to., a patch . file 
record and comprising a code, a NEW file position, a length, and a 
modifier: The; code ;, describes the . record .type and is .one. .of the 

3 0 ' following values;;. 0 for -"Copy"; 1 for "Add"; 2 for ./'Mod' J; 3 for. "Special 

Handling 1"; 4 for "Special Handling 2"; and. 5 for "Special Handling 4." 
The NEW file position indicates the position in ,the NEW file_ to which 
: this . record; pertains and the -length field indicates its^ lengthy The 
meaning: of the; .modifier varies according to., the, value in the. code 
35 field for: •- . , •' "\ • lix <. s:- •.:.}-:. ;-v 



Four characters: 

• X7 .. . 

5 . Five characters: 
X23 . .- 
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"Copy" records, it indicates the OLD file position from which the 
block is to be copied; 

"Add" records, it is unused; 

"Mod" positions, it indicates the difference between the NEW 
5 file contents at that position and the OLD file contents that will have 
been copied into that position during patch application; and for 

"Special Handling" records, it indicates the 1-, 2-, or 4-character 
pattern that will be repeated through that region. 

1 0 This auxiliary table 38, unlike the earlier table, is partitioned into 
separate sections for Copy/Special Handling, Add and Mod records. 
This decoding proceeds by the following steps: 

Step 1 . Decompression of the patch file. 

1 5 Step 2. Decoding of the various patch file records, using the 

formats described above. 

Step 3 . Synthesizing the NEW file position for each record (see 
below for more details) and placing each decoded record into the 
appropriate. -=partition of -the- -auxiliar-y^table -3 8 - - ----- — - 

20 

It is to be noted that none of the records in the input patch file 
contains a NEW file position. Rather, the NEW file position for each 
record is implicated from the fact that all the Copy and Special 
handling records were packed into the patch file in order and any 

2 5 regions not covered by these records were indicated in various Gap 

sizes. To accomplish the synthesis of the NEW file positions, the 
following method is used: 

Step 1 . Initialize the current position and the total Gap Size to 0. 

3 0 Step 2 . Read a record. If it is an "Add" or a "Mod" or an "End Of 

File" then stop this process. 

Step 3 . If this record contains a Gap, then add an entry in the 
n Add" partition of the table with length equal to the Gap Size, and 
NEW file position equal to the current position. Add the Gap Size to 
3 5 the current position and to the total Gap Size. 
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Step 4 . Add an entry to the "Copy/Special Handling" partition of 
the table with code, length and modifier taken from the record and 
NEW file position equal, to the current position. Add the length to the 
current position. ~ ' 

5 Step 5 . Go to Step .2. . 

When,, the above process is complete, the ."Mod", record, if present, 
will be -processed in a similar way.- When the "Mod" record has been 
processed, the "Add" record is processed (if present) by .adding, one 
10 additional entry to the "Add" partition of the table, which has NEW 
file position .equal, to the current position and length to the difference 
between the size, of the "Add" record and .the total Gap^Size computed 
•*. earlier. At this point, the preferred embodiment begins the NEW. file 
.... construction process. 32. , , _ ;• : 

15 The NEW File construction process reads the "Copy/Special 

, Handling" partition of the auxiliary table 3.8 and performs all the 
indicated actions in order. Then it reads the . "Mod'; partition of . the 
auxiliary table and performs all the indicated modifications. Finally, 
:it reads the "Add" portion of the auxiliary table (in order) and placed 
2 0 the decompressed characters from the "Add" record (if present --. 
these characters were not processed earlier) into the appropriate 
, place in the Output NEW file 40. f . . - 

At this point, the Output NEW file 40 is an exact duplicate of 
the Input NEW f ile^ 22 as .illustrated, in . Figs,^ Isl and lb thereby 
2 5.-. accomplishing, a stated object of the invention,. ,, 

- . While the foregoing is directed, to the .preferred, embodiments 
of the invention, the scope thereof is determined, .by the, claims, .that 
follow. . • • . •; -. . : 

., . .. What is claimed is: . .. ■ ... ' . ; 
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CLAIMS 



1. A method, for converting an OLD computer file into an updated 
NEW computer file, the method comprising the steps of: 

5 (a) identifying differences between said OLD file and a NEW 

computer file by passing both files through a patch build program; 

(b) creating a patch file based on identified differences; and 

(c) combining said OLD file and said patch file by passing 
both files through a patch apply program to obtain said NEW file. 

1 0 

2. The method of claim 1 wherein said patch file is created from 
said OLD file at a central facility, and patch files are subsequently 
distributed to a plurality of remote facilities to convert a plurality of 
the OLD files into plural NEW files and including the additional steps 

1 5 of: 

(a) creating an auxiliary table from said patch file with a 
patch file decoder subroutine; and 

(b) combining said auxiliary table, said OLD file and said 
patch— file— wk¥-a^ generating 

2 0 a copy of said NEW file. 

3. The method of claim 1 wherein the step of creation of said 
patch file comprises the steps of: 

(a) selecting a portion of said OLD file; 

2 5 (b) processing said portion of OLD file with a string table 

preparation subroutine thereby creating a string table from said 
selected portion of OLD file; 

(c) combining said portion of OLD file, said NEW file and said 
string table by a match table preparation subroutine thereby 

3 0 creating a match table having identified approximate matches 

between said OLD file and said NEW file; 

(d) repeating steps (a) through (c) until all portions of OLD 
file have been processed; and 

(e) combining said OLD file, said NEW file and said match 
3 5 table in a patch file encoding subroutine to create said patch file. 
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4. The method of . claim 3 wherein said match table subroutine 
comprises the .additional steps of: 

(a) successively selecting fixed-length substrings from the ' 
• NEW file; . " . ;. ~ ' ;■- 

5 (b) utilizing the string table to find any matching copies of 

the selected substrings in. said OLD file portion; and - ; 

■■(c) « extending . said matching substring copy to ; .a longer, 
approximate match in the forward and reverse directions. 

10 5. The method of claim 3 wherein a string table is prepared from * 
the OLD, file by. the steps: ' . ; . ^ ; : : : ^ 

(a) successively selecting fixed-length substrings from said 
portion of OLD file whose position relative to beginning of said OLD 
file .portion is ;a multiple of a fixed increment, said fixed length and 

15 fixed increment haying no common factors greater than one; and 

(b) if said selected substring does not comprise a; repeating 
pattern, recording location of said selected substrings in said string 

■ ■ table. - - - • vr ■ • ... : .... • : ■■ v; .<T ' ■ 

2 0 6; The method of claim 5 wherein substrings which have already 
been recorded in said string table a fixed number of times are not 
^recorded further in said string table until a subsequent portion of 
said OLD file is selected, : — , ^ :.: \." \. : , . .:i , : 

2 5 7. A computer program, for converting an OLD computer file, into - 

an updated NEW computer file, the : program: comprising: . ; 

(a) a, NEW file .template; pro gr^^ of 
said NEW file;^ - \'. - : ; . u*- l" . .;; . - ; r i • 

(b) a patch build program which identifies a difference 

3 0 ■. (i);,-,, identifying a portion of said OLD file to be 

converted; £ a ftd : -: ""LK -:--':.fr*- * ^c^x:; 

- <ii) ; creating a patch file; and . : , - 

(c) 7 : ,a patch apply, program which converts said - OLD filer into 
said NEW -file by combining said; .patch file with said portion ; of the 

3 5 OLD file to be converted to form* said NEW : - ( r . : ; \ : 
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8. The computer program of claim 7 further comprising: 

(a) a string table memory receiving a string table from said 
OLD file formed by a string table subroutine; 

(b) a match table memory for receiving the result of a match 
5 table subroutine which combines said OLD file, said NEW file and said 

string table to create a match table having identified approximate 
matching regions between said OLD file and said NEW file; and 

(c) a patch file encoding subroutine which combines said OLD 
file, said NEW file and said match table to create said patch file. 

1 0 

9. The computer program of claim 7 further comprising: 

(a) a patch file decoder subroutine which creates an auxiliary 
table from said patch file; and 

(b) a NEW file construction subroutine which combines said 

1 5 auxiliary table, said OLD file and said patch file for generating said 

NEW file. 

10. The process of forming smaller patch files using the program of 
claim 7 .and comprising: 

2 0 (a) inputting the OLD file to a string locating subroutine to 

collect a string table; 

(b) inputting the string table into a match subroutine 
provided with the NEW input file to form a match table; 

(c) collecting the matches from the table and non-matching 

2 5 regions from said NEW file to form a list of patches; and 

(d) encoding said patches into a patch file for sending to a 
plurality of remote locations so that the remote locations can use said 
patch apply program to convert OLD files to NEW files. 

3 0 11. A method for controlling a general purpose computer, and 

causing it to compare an OLD binary file and a NEW binary file, to 
construct a patch file containing the differences between the OLD and 
NEW binary files, and subsequently controlling a remote different 
general purpose computer to use said patch file to reconstruct and 
3 5 output said NEW file from said OLD file comprising: 
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(a) a binary file input sequence enabling the OLD and NEW 
files to be accessed repeatedly and progressively; , 

,(b) a string table -subroutine forming stored data structures 
comprising a plurality of fixed-length substrings within a selected 
5 . portion of said OLD file; 

( c ) applying, the string table subroutine K to successive 
portions of said OLD file so that the contents of said . -string table are 
constructed based on selected portions of said OLD file so that the 
. OLD file ..is completely input thereto; : 
10 . (d) . a match table, for . storing data structures describing 
.approximate matches between portions of the, NEW. file . compared to 

. the. OLD file; - , v ,■ ... . - • •. r . ' u •.: . 

(e) . inputting to the match table , the said NEW . file partitioned 
». into selected length substrings from the. OLD ffle processed through 
15 the string table wherein said substrings ; are , extended to .longer- 
approximate matches in either the forward or reverse directions; 
. ' (f) cyclically repeating string table preparation and match 
table preparation wherein repeated, successive portions ^of said ;first 
file; are processed; : v. '^t'i. : - 

20 . . (g) optimizing the match, table lengths of... the approximate 
matches to produce a patch file of smaller patches; 
. . ■,, ... ,., w (h) • forming, a completed patch file for subsequent use.; . 

(i) forming a binary computer file including, fhe^ patch r file 
from the data structures in the match table encoded into a binary 

2 5 computer , file, and., transmitted to a. patch - file driven : different , 

: .computer;^ and - . ■ ; ■<. Lc-i-c.;-:- ■ ■ 

(j) applying said patch file reconstruction to said OLD file 

- with said, encoded patch file. - - . ;t ~ ■ ; ? 

3 0 12. The method of claim 11 wherein the step of forming the_ match 

table defines a substring approximate match, to, occur; when the ratio 
of total mismatched characters to total . characters is less than a 
selected ratio, and no adjacent characters having a selected and fixed 
block size, .contains -more mismatches than a fixed and selected value 
3 5 and, said ^selected ratio, and said selected value, are input to -control 
forming the matches. 
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13. The method of claim 11 wherein match table optimization 
proceeds by moving backwards through said NEW file by fixed 
increments, and progressively recording estimates of the number of 

5 characters required to encode said NEW file from that point forward 
to an end point, and adjusting the length of the match to minimize 
said estimate. 

14. The method of claim 11 wherein said string table subroutine 
1 0 proceeds by placing into said string table data structures necessary 

to locate those substrings of said OLD file which have a fixed size and 
which begin at a location relative to the last selected portion which is 
a multiple of fixed length, and wherein said fixed length and fixed 
increment have no common factors greater than one and said 

1 5 increment is chosen from a plurality of choices. 

15. The method of claim 11 wherein substrings of said OLD file 
comprising repeating patterns of a plurality of sizes are not input 
into said^stong" table by said string table preparation method, and 

2 0 said fixed-length substrings of said NEW file comprising repeating 

patterns of a plurality of sizes are not processed by said match table 
preparation method, and repeating substrings are encoded separately 
into said patch file. 

2 5 16. The method of claim 11 wherein said patch file encoding 

method proceeds by forming patch file records of a plurality of types, 
including: 

(a) copy of records, wherein said completed patch file is 
instructed to append a copy of a region of said OLD file to 

3 0 reconstruction of NEW file, and comprising: 

(i) an identifying code; 

(ii) a location in said OLD file; and 

(iii) a length; 

(b) copy with gap records, wherein said completed patch file 
3 5 is instructed to append a gap to said reconstruction of NEW file, 
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records the position and length of said gap, and then append a copy 
of a region of said OLD file and comprising: 

(i) an identifying code; 

(ii) a gap size; 

5 (iii) a location in said OLD file; and 

...... (iy) a length;. . : . ; ^ : -' 

(c) mismatch modification records, wherein said completed 
patch file is instructed to set a current modification position to the 
beginning of said reconstruction of NEW file, and then to modify 

10 certain individual characters in said reconstruction of NEW file, and 
comprising: v : ' 

(i) an identifying code; . ^ :- \ ,.. 

(ii) a modification count; ~ - 

(iii) a plurality of modification. identifiers, comprising: 
15 _ , (1) an increment which is to be added to said 

current modification position of said 
reconstruction of NEW. file; 
(2) a difference which is added to the character 
at said modification position in said 
2 0 reconstruction of NEW file, and 

(d) , add records, wherein said completed , patch file is 
instructed to place indicated, characters in. previously recorded, gaps 
in said reconstruction of NEW file ,and : append any remaining 
characters to said reconstruction of NEW file, .and comprising: 

2 5 (i) an identifying code, v mx l; , i 

(ii) a length, and *; : w ; rr 

_ . -.. .(iii) characters to be placed is said gaps or appended to 
said reconstruction of NEW file. ,. : ... ^ 

30 ^17, - : The method and apparatus of claim .11, . wherein, said .patch, file 
encoding method proceeds by forming patch file records of . plurality 

of types, including . :> r ;- f %t v 

(a) copy records, wherein, said; patch , application method is 
instructed- to., append a copy, of a region oL said -OLD file to said 

3 5 reconstruction of NEW file, and comprising: 

(i) an identifying code, 
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(ii) a location in said OLD file, and 

(iii) a length, 

(b) copy with gap records, wherein said completed patch file 
is instructed to append a gap to said reconstruction of NEW file, 

5 record the position and length of said gap, and then append a copy of 
a region of said OLD file to said reconstruction of NEW file, and 
comprising: 

(i) an identifying code, 

(ii) a gap size, 

1 0 (iii) a location in said OLD file, and 

(iv) a length, 

(c) pattern fill records, wherein said completed patch file is 
instructed to append to said reconstruction of NEW filled with a 
repeating pattern, comprising: 

1 5 (i) an identifying code, which also identifies length of 

said pattern, 

(ii) a pattern, and 

(iii) a length, 

(d-)— - pattern with gap records, wherein said completed 

2 0 patch file is instructed to append a gap to reconstruction of NEW file, 

record the position and length of said gap and then append to said 
file a region filled with a repeating pattern, and comprising: 

(i) an identifying code, which also identifies length of 

said pattern, 

2 5 (ii) a gap size, 

(iii) a pattern, and 
(e) mismatch modification records, wherein said completed 
patch file is instructed to set a current modification position to the 
beginning of said reconstruction of NEW file, and then to modify 

3 0 certain individual characters in said reconstruction of NEW file, and 

comprising: 

(i) an identifying code, 

(ii) as modification count, 

(iii) a plurality of modification identifiers, comprising: 
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(1) an increment which is to be added to said 
current modification position of said 

, . - reconstruction of NEW file, . ■ - 

(2) a difference which is added to the character 
5 i at said modification position in said 

.. , . ' ' v ^ f > reconstruction of NEW file, . 
(f) add records, wherein said . completed patch file, is 
. instructed to place indicated characters in previously recorded gaps 
in said reconstruction of v NEW file and appends any remaining 
10. characters to .said reconstruction of NEW file, and comprising:. 

(I) an identifying code, ; . - - ,•: 

_ .. (ii) v a length, and , • r : . - r - , 

(iii) characters to be placed in said, gaps or .appended to 
said reconstruction of NEW file. - v : , ; - l c: 

15;,, , : i . :-^ri\i : .. -r. v V 7 " 

18, . The method >of claim 11 including the steps of: .y. 

..«, . (a), forming, the string table; ; : • _ v. : v: 
: (b) - forming , the : match table; 

(c) forming the patch file derived from the string and, match 

2 0 table; and , / + : f : - , ; .j-- — : ■■■ [■ 

. (d) defining; different measures of match in a matph table 
subroutine- so that the patch file is minimized in size, ;.v ^ 

- 1 9 - .The , method- of ,^claim r 1 8 including the steps of progressively 

2 5. ; building }he ■ , match ,jabje . by .repeated-, processing -of . £ successive; 

positions if the NEW file,, and ^ wherein ; repeated portions thereof, are 

processed to -completion. - v .-. • ..• '"i ; ' • : .'..\ 

'''W'-c^ai- .sse- SE.:a 'w.;.;' -,:-o--;" :;••,-:■!.:-" u\ - ■■-'X^:- .; ..v:rw 
.20. The meth^ the. match, : table, encodes 

3 0 match position, number of matching symbols in sequence, . and 

estimated number of symbols to encode the match. 

v 21. Apparatus ;~. for controlling, a general . purpose . computer to 
compare an OLD ;binary file and a NEW -binary file, .to xonstruct a 
3 5 patch file containing the, : differences between said binary .files, . and 
subsequently enabling a general purpose providing with said patch 
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file to construct said NEW file from said OLD file and patch file, 
comprising: 

(a) a file input means enabling the OLD and NEW files to be 
accessed repeatedly; 

5 (b) string table storage means storing data structures 

comprising a plurality of fixed-length substrings within a currently 
selected portion of said OLD file; 

(c) a string table preparation circuit selecting successive 
portions of said OLD file so that the contents of said string table 

1 0 storage means are constructed based on the currently selected 

portioned of said OLD file; 

(d) match table storage means storing data structures 
describing approximate matches between regions of said OLD file and 
regions of ssaid NEW file; 

15 (e) a match table preparation circuit forming from the said 

NEW file fixed-length substrings located in said OLD file by structures 
stored in the string table means, and substrings are extended 
thereby to longer approximate matches in the forward or reverse 
directions-;- 

2 0 (f) wherein said file input means drives said string table 

preparation circuit and said match table preparation circuit for 
repeated, successive portions of said OLD file, until all portions of said 
file are processed; 

(g) a match table optimization circuit controlling the lengths 

2 5 of the approximate matches in the data structures in the match table 

storage means to produce a minimized patch file; and 

(h) patch file output means for patch file encoding so that the 
data structure in the match table storage means are encoded into a 
binary computer file and transmitted by said patch file output 

3 0 means. 



22. The apparatus of claim 21 wherein said match table 
preparation circuit defines two substrings to match when the ratio of 
total characters is less than a selected proportion, and a block of 
3 5 adjacent characters contains fewer mismatches than a fixed value. 
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23. The apparatus of claim ,21 wherein said match table 
optimization circuit proceeds by moving backwards through said 
NEW file by increments of a -fixed size, and progressively makes and 
records estimates of the number of characters required to encode 
5 said NEW file from that point forward to the end, and adjusts the 
length of the .match ; used, to encode said block to minimize said 
•■estimate.;,,,; ,•• •.. ,. ., .. ■ ^ .■'<> 

24.. The apparatus of claim 21 wherein said string, table 
1 0 ' preparation circuit processes data structures .necessary to locate 
those substrings of said OLD file which have length equal to a fixed 
size, and which begin at,, a location relative, xq said currently selected 
porfi'on* Vhich'.is ji multiple .-of a . fixed s increment, and wherein said 
fixe'if size.; fandjfixed ...increment • .have. no . common factors greater than 
1 5 one and said increment is .selectable from a. plurality of choices. 



25. The apparatus, of claim 21 wherein, substrings of said OLD file 
comprising repeating patterns are not. placed into said string . table 
storage means, and said iixed-length, substrings , of said NEW file 

2 0 l', t comprising. 1 repeating,, patterns , of. a . plurality, . of sizes, .are. not . 
, processed by said match table preparation circuit,, but said repeating 
5 ' Substrings' are. .•encoded' separately : .into said patch file. : 

■■ ■ ■ ')■■'■■ . .- ■■■■■,&:■;>* '3^:V' V -J+:V;v! lib ■ ■ •':''■; •■ ■ 

26. The apparatus of claim 21 wherein said ..patch file encoding 

2 5 proceeds: by ; forming, patch file records ..of a . plurality of types, , 

including: ' - - ..: r. v • ^; . ; V; • ' • ' - • • "• *' ' ' 

(a) copy records wherein said patch file output means is 
instructed Jto - append copy-, of a region of said OLD file, to 
reconstruction of /NEW file, ^and comprising:. ■ , ? : : : 

3 0 ...... ... v ,, ~ ; ,(i):,;v an identifying code, _ ^ : £ : 0 

(ii) : r a location in said. OLD file, and , , s .. . 

. M .. (iii),, a, length; t . . . , ... . ^ ; : 

(b) copy.^with gap ,".records wherein ^said patchy file puput 
, . .means is : in.s^uctei. to ..append, a .gap to the , reconstruction of NEW file, 

3 5 ', record the 'ppktipn ^and length of . said , gap, and then copy ^ a Region of 
^, said QLD^ file to the .xeconstruction of ^said NEW file,, and pomprisirig: 
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(i) an identifying code, 

(ii) a gap size, 

(iii) a location in said OLD file, and 

(iv) a length; 

5 (c) mismatch modification records, wherein said patch file 

output means is instructed to set a current modification position to 
the beginning of said reconstruction of NEW file and then to modify 
certain individual characters in said reconstruction of NEW file, 
comprising: 
1 0 (i) an identifying code, 

(ii) a modification count, 

(iii) a plurality of modification identifiers, comprising: 

(1) an increment which is to be added to said 
current modification position of said 

1 5 reconstruction of NEW file, and 

(2) a difference which is added to the character 
at said modification position in said 
reconstruction of NEW file; 

(d) add records wherein said- pate*— file output means is 

2 0 instructed to place indicated characters in previously recorded gaps 

in said reconstruction of NEW file and appends any remaining 
characters to said reconstruction of NEW file, and comprising: 

(i) an identifying code, 

(ii) size 

2 5 (iii) characters to be placed in said gaps or appended to 

said reconstruction of NEW file. 

27. A method of processing NEW and OLD files through a patch 
build program in a computer comprising the steps of: 

3 0 (a) building a string table from the OLD file processed from 

the start to the end thereof to compile strings therein; 

(b) using strings in the string table to make matches with 
NEW file wherein the matches are stored in a match table; 

(c) for each match in the match table preparing for each 
3 5 match values of I, L, RL, and S wherein I is match location, L is 

number of consecutive approximately matching characters in the 
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forward direction,. RL is number of consecutive approximately 
matching characters in the reverse direction, and S Is estimated 
number of characters required to encode the match; . 

(d) progressively changing the values of L to find the longest 
5 match. * 

28, -. The .method of claim : 27 wherein the matches in the match 
table are progressively and dynamically changed by: : . d 

(a) '.' adding to L an adjacent chunk of the OLD file; 

10 x 1 (b) modifying S dependent on the, added chunk; : 

(c) re-evaluating the value of L after steps (a) and (b) to 
obtain new values I, L, RI and S; and 

- (d) optimizing the match table by .re-evaluating matches 
therein so that the entire table is processed by the repeated steps (a) 

1 5 • : and (b) until there are no remaining ^adjacent chunks to process. 

, .29.- The method claim .28 wherein match table optimization is 
dependent on chunk size and including the steps of changing chunk 
size. 

30. ; The method of claim 29 wherein chunk -size is increased by 
• doubling the size. < .- * :1 ; • : ■ ^ ; . ... , . -\ . , : 

31. A method for controlling ; , a, general purpose ^ computer to 
progressively compare an OLD binary file with a NEW binary file, and 

2 5 ^,. construct ;a patch file, containing the differences, between the OLD and 

. NEW binary files,-- and; to enable subsequent ^operation of a remote 
: :;V and - different general purpose computer to use said patch. - file to 
reconstruct and output said NEW file from said OLD file and 
comprising the steps of: 

3 0,- (a) forming a string table of stored data structures 

comprising a plurality of entries therein; - ; : ■\-cz^> i • i ^ 

(b) evaluating a selected portion of said OLD file so that the 
contents of said string table. : are- constructed based on selected 
portions of said OLD file; 

35 ( C ) forming a match table for storing data structures 

describing approximate matches between portions of the NEW file 
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compared to the OLD file and inputting to the match table the stored 
data structures; 

(d) cyclically repeating string table and match table 
formation wherein repeated, successive entries into said tables are 

5 optimized in match table lengths to produce a patch file of selected 
sizes; and 

(e) forming a completed patch file for subsequent use from 
the data structures in the match table encoded into a binary 
computer file to be transmitted as a patch file for a remote computer 

1 0 to enable patch file reconstruction of said OLD file with said 
transmitted patch file. 

32. The method of claim 31 including the step of forming the string 
table from the OLD file by evaluating all of the OLD file, forming the 

1 5 match table by inputting all of the NEW file for match table 

formation, and progressively extending match table and string table 
formation wherein initial matches in the match table overlap other 
matches, and extending the process until match overlap ends, 

2 0 33. The method of claim 31 including the step of forming the match 

table initially with a set of stored data structures describing 
approximate matches, and subsequently refining the approximate 
matches in an iterative process. 

2 5 34. The method of claim 31 including the step of forming the match 

table by serially processing the entire NEW file to completion, and 
thereafter optimizing the match table to reduce the size of the patch 
file. 

3 0 35. The method of claim 31 including the step of forming the patch 

file as a data stream stored on a memory media. 

36. The product made by the method of claim 35. 
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37. The product made by the remote computer when provided 
with the patch file of claim 31 and the OLD file and which comprises 
the NEW file data stream. 

5 38. The method of claim 31 wherein said string table is formed by 
locating repeated patterns in said OLD • file.^^jformnjljthe match 
table initially with approximate jnatches~ wherein the. patches are 
-progressively refined while reducing the patch file ; size. 

iiO ''•iS':'"' ' thrTmeSodHof claim 31: including the repeated step of 
evaluating the OLD file until the entire OLD_fil£js evaluated, and 
" including the step of evaluating" 'the entire NEW file for. matches with 
- the entire OLD file. ; ' ; 

15 40. The method of claim 39 including the; step of completing the 
match table with the NEW file, and optimizing the match table 
overlapping the step of completing the match table. 
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